Merge branch 'main' into deque
|
@ -23,7 +23,7 @@ The list of topics for which we are looking for content are provided below along
|
|||
- Interacting with Databases - [Link](https://github.com/animator/learn-python/tree/main/contrib/database)
|
||||
- Web Scrapping - [Link](https://github.com/animator/learn-python/tree/main/contrib/web-scrapping)
|
||||
- API Development - [Link](https://github.com/animator/learn-python/tree/main/contrib/api-development)
|
||||
- Data Structures & Algorithms - [Link](https://github.com/animator/learn-python/tree/main/contrib/ds-algorithms)
|
||||
- Data Structures & Algorithms - [Link](https://github.com/animator/learn-python/tree/main/contrib/ds-algorithms) **(Not accepting)**
|
||||
- Python Mini Projects - [Link](https://github.com/animator/learn-python/tree/main/contrib/mini-projects) **(Not accepting)**
|
||||
- Python Question Bank - [Link](https://github.com/animator/learn-python/tree/main/contrib/question-bank) **(Not accepting)**
|
||||
|
||||
|
|
|
@ -20,3 +20,4 @@
|
|||
- [Eval Function](eval_function.md)
|
||||
- [Magic Methods](magic-methods.md)
|
||||
- [Asynchronous Context Managers & Generators](asynchronous-context-managers-generators.md)
|
||||
- [Threading](threading.md)
|
||||
|
|
|
@ -0,0 +1,198 @@
|
|||
# Threading in Python
|
||||
Threading is a sequence of instructions in a program that can be executed independently of the remaining process and
|
||||
Threads are like lightweight processes that share the same memory space but can execute independently.
|
||||
The process is an executable instance of a computer program.
|
||||
This guide provides an overview of the threading module and its key functionalities.
|
||||
|
||||
## Key Characteristics of Threads:
|
||||
* Shared Memory: All threads within a process share the same memory space, which allows for efficient communication between threads.
|
||||
* Independent Execution: Each thread can run independently and concurrently.
|
||||
* Context Switching: The operating system can switch between threads, enabling concurrent execution.
|
||||
|
||||
## Threading Module
|
||||
This module will allows you to create and manage threads easily. This module includes several functions and classes to work with threads.
|
||||
|
||||
**1. Creating Thread:**
|
||||
To create a thread in Python, you can use the Thread class from the threading module.
|
||||
|
||||
Example:
|
||||
```python
|
||||
import threading
|
||||
|
||||
# Create a thread
|
||||
thread = threading.Thread()
|
||||
|
||||
# Start the thread
|
||||
thread.start()
|
||||
|
||||
# Wait for the thread to complete
|
||||
thread.join()
|
||||
|
||||
print("Thread has finished execution.")
|
||||
```
|
||||
Output :
|
||||
```
|
||||
Thread has finished execution.
|
||||
```
|
||||
**2. Performing Task with Thread:**
|
||||
We can also perform a specific task by thread by giving a function as target and its argument as arg ,as a parameter to Thread object.
|
||||
|
||||
Example:
|
||||
|
||||
```python
|
||||
import threading
|
||||
|
||||
# Define a function that will be executed by the thread
|
||||
def print_numbers(arg):
|
||||
for i in range(arg):
|
||||
print(f"Thread: {i}")
|
||||
# Create a thread
|
||||
thread = threading.Thread(target=print_numbers,args=(5,))
|
||||
|
||||
# Start the thread
|
||||
thread.start()
|
||||
|
||||
# Wait for the thread to complete
|
||||
thread.join()
|
||||
|
||||
print("Thread has finished execution.")
|
||||
```
|
||||
Output :
|
||||
```
|
||||
Thread: 0
|
||||
Thread: 1
|
||||
Thread: 2
|
||||
Thread: 3
|
||||
Thread: 4
|
||||
Thread has finished execution.
|
||||
```
|
||||
**3. Delaying a Task with Thread's Timer Function:**
|
||||
We can set a time for which we want a thread to start. Timer function takes 4 arguments (interval,function,args,kwargs).
|
||||
|
||||
Example:
|
||||
```python
|
||||
import threading
|
||||
|
||||
# Define a function that will be executed by the thread
|
||||
def print_numbers(arg):
|
||||
for i in range(arg):
|
||||
print(f"Thread: {i}")
|
||||
# Create a thread after 3 seconds
|
||||
thread = threading.Timer(3,print_numbers,args=(5,))
|
||||
|
||||
# Start the thread
|
||||
thread.start()
|
||||
|
||||
# Wait for the thread to complete
|
||||
thread.join()
|
||||
|
||||
print("Thread has finished execution.")
|
||||
```
|
||||
Output :
|
||||
```
|
||||
# after three second output will be generated
|
||||
Thread: 0
|
||||
Thread: 1
|
||||
Thread: 2
|
||||
Thread: 3
|
||||
Thread: 4
|
||||
Thread has finished execution.
|
||||
```
|
||||
**4. Creating Multiple Threads**
|
||||
We can create and manage multiple threads to achieve concurrent execution.
|
||||
|
||||
Example:
|
||||
```python
|
||||
import threading
|
||||
|
||||
def print_numbers(thread_name):
|
||||
for i in range(5):
|
||||
print(f"{thread_name}: {i}")
|
||||
|
||||
# Create multiple threads
|
||||
thread1 = threading.Thread(target=print_numbers, args=("Thread 1",))
|
||||
thread2 = threading.Thread(target=print_numbers, args=("Thread 2",))
|
||||
|
||||
# Start the threads
|
||||
thread1.start()
|
||||
thread2.start()
|
||||
|
||||
# Wait for both threads to complete
|
||||
thread1.join()
|
||||
thread2.join()
|
||||
|
||||
print("Both threads have finished execution.")
|
||||
```
|
||||
Output :
|
||||
```
|
||||
Thread 1: 0
|
||||
Thread 1: 1
|
||||
Thread 2: 0
|
||||
Thread 1: 2
|
||||
Thread 1: 3
|
||||
Thread 2: 1
|
||||
Thread 2: 2
|
||||
Thread 2: 3
|
||||
Thread 2: 4
|
||||
Thread 1: 4
|
||||
Both threads have finished execution.
|
||||
```
|
||||
|
||||
**5. Thread Synchronization**
|
||||
When we create multiple threads and they access shared resources, there is a risk of race conditions and data corruption. To prevent this, you can use synchronization primitives such as locks.
|
||||
A lock is a synchronization primitive that ensures that only one thread can access a shared resource at a time.
|
||||
|
||||
Example:
|
||||
```Python
|
||||
import threading
|
||||
|
||||
lock = threading.Lock()
|
||||
|
||||
def print_numbers(thread_name):
|
||||
for i in range(10):
|
||||
with lock:
|
||||
print(f"{thread_name}: {i}")
|
||||
|
||||
# Create multiple threads
|
||||
thread1 = threading.Thread(target=print_numbers, args=("Thread 1",))
|
||||
thread2 = threading.Thread(target=print_numbers, args=("Thread 2",))
|
||||
|
||||
# Start the threads
|
||||
thread1.start()
|
||||
thread2.start()
|
||||
|
||||
# Wait for both threads to complete
|
||||
thread1.join()
|
||||
thread2.join()
|
||||
|
||||
print("Both threads have finished execution.")
|
||||
```
|
||||
Output :
|
||||
```
|
||||
Thread 1: 0
|
||||
Thread 1: 1
|
||||
Thread 1: 2
|
||||
Thread 1: 3
|
||||
Thread 1: 4
|
||||
Thread 1: 5
|
||||
Thread 1: 6
|
||||
Thread 1: 7
|
||||
Thread 1: 8
|
||||
Thread 1: 9
|
||||
Thread 2: 0
|
||||
Thread 2: 1
|
||||
Thread 2: 2
|
||||
Thread 2: 3
|
||||
Thread 2: 4
|
||||
Thread 2: 5
|
||||
Thread 2: 6
|
||||
Thread 2: 7
|
||||
Thread 2: 8
|
||||
Thread 2: 9
|
||||
Both threads have finished execution.
|
||||
```
|
||||
|
||||
A ```lock``` object is created using threading.Lock() and The ```with lock``` statement ensures that the lock is acquired before printing and released after printing. This prevents other threads from accessing the print statement simultaneously.
|
||||
|
||||
## Conclusion
|
||||
Threading in Python is a powerful tool for achieving concurrency and improving the performance of I/O-bound tasks. By understanding and implementing threads using the threading module, you can enhance the efficiency of your programs. To prevent race situations and maintain data integrity, keep in mind that thread synchronization must be properly managed.
|
|
@ -234,3 +234,220 @@ print("Length of lis is", lis(arr))
|
|||
## Complexity Analysis
|
||||
- **Time Complexity**: O(n * n) for both approaches, where n is the length of the array.
|
||||
- **Space Complexity**: O(n * n) for the memoization table in Top-Down Approach, O(n) in Bottom-Up Approach.
|
||||
|
||||
# 5. String Edit Distance
|
||||
|
||||
The String Edit Distance algorithm calculates the minimum number of operations (insertions, deletions, or substitutions) required to convert one string into another.
|
||||
|
||||
**Algorithm Overview:**
|
||||
- **Base Cases:** If one string is empty, the edit distance is the length of the other string.
|
||||
- **Memoization:** Store the results of previously computed edit distances to avoid redundant computations.
|
||||
- **Recurrence Relation:** Compute the edit distance by considering insertion, deletion, and substitution operations.
|
||||
|
||||
## String Edit Distance Code in Python (Top-Down Approach with Memoization)
|
||||
```python
|
||||
def edit_distance(str1, str2, memo={}):
|
||||
m, n = len(str1), len(str2)
|
||||
if (m, n) in memo:
|
||||
return memo[(m, n)]
|
||||
if m == 0:
|
||||
return n
|
||||
if n == 0:
|
||||
return m
|
||||
if str1[m - 1] == str2[n - 1]:
|
||||
memo[(m, n)] = edit_distance(str1[:m-1], str2[:n-1], memo)
|
||||
else:
|
||||
memo[(m, n)] = 1 + min(edit_distance(str1, str2[:n-1], memo), # Insert
|
||||
edit_distance(str1[:m-1], str2, memo), # Remove
|
||||
edit_distance(str1[:m-1], str2[:n-1], memo)) # Replace
|
||||
return memo[(m, n)]
|
||||
|
||||
str1 = "sunday"
|
||||
str2 = "saturday"
|
||||
print(f"Edit Distance between '{str1}' and '{str2}' is {edit_distance(str1, str2)}.")
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
Edit Distance between 'sunday' and 'saturday' is 3.
|
||||
```
|
||||
|
||||
## String Edit Distance Code in Python (Bottom-Up Approach)
|
||||
```python
|
||||
def edit_distance(str1, str2):
|
||||
m, n = len(str1), len(str2)
|
||||
dp = [[0 for _ in range(n + 1)] for _ in range(m + 1)]
|
||||
|
||||
for i in range(m + 1):
|
||||
for j in range(n + 1):
|
||||
if i == 0:
|
||||
dp[i][j] = j
|
||||
elif j == 0:
|
||||
dp[i][j] = i
|
||||
elif str1[i - 1] == str2[j - 1]:
|
||||
dp[i][j] = dp[i - 1][j - 1]
|
||||
else:
|
||||
dp[i][j] = 1 + min(dp[i - 1][j], dp[i][j - 1], dp[i - 1][j - 1])
|
||||
|
||||
return dp[m][n]
|
||||
|
||||
str1 = "sunday"
|
||||
str2 = "saturday"
|
||||
print(f"Edit Distance between '{str1}' and '{str2}' is {edit_distance(str1, str2)}.")
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
Edit Distance between 'sunday' and 'saturday' is 3.
|
||||
```
|
||||
|
||||
## **Complexity Analysis:**
|
||||
- **Time Complexity:** O(m * n) where m and n are the lengths of string 1 and string 2 respectively
|
||||
- **Space Complexity:** O(m * n) for both top-down and bottom-up approaches
|
||||
|
||||
|
||||
# 6. Matrix Chain Multiplication
|
||||
|
||||
The Matrix Chain Multiplication finds the optimal way to multiply a sequence of matrices to minimize the number of scalar multiplications.
|
||||
|
||||
**Algorithm Overview:**
|
||||
- **Base Cases:** The cost of multiplying one matrix is zero.
|
||||
- **Memoization:** Store the results of previously computed matrix chain orders to avoid redundant computations.
|
||||
- **Recurrence Relation:** Compute the optimal cost by splitting the product at different points and choosing the minimum cost.
|
||||
|
||||
## Matrix Chain Multiplication Code in Python (Top-Down Approach with Memoization)
|
||||
```python
|
||||
def matrix_chain_order(p, memo={}):
|
||||
n = len(p) - 1
|
||||
def compute_cost(i, j):
|
||||
if (i, j) in memo:
|
||||
return memo[(i, j)]
|
||||
if i == j:
|
||||
return 0
|
||||
memo[(i, j)] = float('inf')
|
||||
for k in range(i, j):
|
||||
q = compute_cost(i, k) + compute_cost(k + 1, j) + p[i - 1] * p[k] * p[j]
|
||||
if q < memo[(i, j)]:
|
||||
memo[(i, j)] = q
|
||||
return memo[(i, j)]
|
||||
return compute_cost(1, n)
|
||||
|
||||
p = [1, 2, 3, 4]
|
||||
print(f"Minimum number of multiplications is {matrix_chain_order(p)}.")
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
Minimum number of multiplications is 18.
|
||||
```
|
||||
|
||||
|
||||
## Matrix Chain Multiplication Code in Python (Bottom-Up Approach)
|
||||
```python
|
||||
def matrix_chain_order(p):
|
||||
n = len(p) - 1
|
||||
m = [[0 for _ in range(n)] for _ in range(n)]
|
||||
|
||||
for L in range(2, n + 1):
|
||||
for i in range(n - L + 1):
|
||||
j = i + L - 1
|
||||
m[i][j] = float('inf')
|
||||
for k in range(i, j):
|
||||
q = m[i][k] + m[k + 1][j] + p[i] * p[k + 1] * p[j + 1]
|
||||
if q < m[i][j]:
|
||||
m[i][j] = q
|
||||
|
||||
return m[0][n - 1]
|
||||
|
||||
p = [1, 2, 3, 4]
|
||||
print(f"Minimum number of multiplications is {matrix_chain_order(p)}.")
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
Minimum number of multiplications is 18.
|
||||
```
|
||||
|
||||
## **Complexity Analysis:**
|
||||
- **Time Complexity:** O(n^3) where n is the number of matrices in the chain. For an `array p` of dimensions representing the matrices such that the `i-th matrix` has dimensions `p[i-1] x p[i]`, n is `len(p) - 1`
|
||||
- **Space Complexity:** O(n^2) for both top-down and bottom-up approaches
|
||||
|
||||
# 7. Optimal Binary Search Tree
|
||||
|
||||
The Matrix Chain Multiplication finds the optimal way to multiply a sequence of matrices to minimize the number of scalar multiplications.
|
||||
|
||||
**Algorithm Overview:**
|
||||
- **Base Cases:** The cost of a single key is its frequency.
|
||||
- **Memoization:** Store the results of previously computed subproblems to avoid redundant computations.
|
||||
- **Recurrence Relation:** Compute the optimal cost by trying each key as the root and choosing the minimum cost.
|
||||
|
||||
## Optimal Binary Search Tree Code in Python (Top-Down Approach with Memoization)
|
||||
|
||||
```python
|
||||
def optimal_bst(keys, freq, memo={}):
|
||||
n = len(keys)
|
||||
def compute_cost(i, j):
|
||||
if (i, j) in memo:
|
||||
return memo[(i, j)]
|
||||
if i > j:
|
||||
return 0
|
||||
if i == j:
|
||||
return freq[i]
|
||||
memo[(i, j)] = float('inf')
|
||||
total_freq = sum(freq[i:j+1])
|
||||
for r in range(i, j + 1):
|
||||
cost = (compute_cost(i, r - 1) +
|
||||
compute_cost(r + 1, j) +
|
||||
total_freq)
|
||||
if cost < memo[(i, j)]:
|
||||
memo[(i, j)] = cost
|
||||
return memo[(i, j)]
|
||||
return compute_cost(0, n - 1)
|
||||
|
||||
keys = [10, 12, 20]
|
||||
freq = [34, 8, 50]
|
||||
print(f"Cost of Optimal BST is {optimal_bst(keys, freq)}.")
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
Cost of Optimal BST is 142.
|
||||
```
|
||||
|
||||
## Optimal Binary Search Tree Code in Python (Bottom-Up Approach)
|
||||
|
||||
```python
|
||||
def optimal_bst(keys, freq):
|
||||
n = len(keys)
|
||||
cost = [[0 for x in range(n)] for y in range(n)]
|
||||
|
||||
for i in range(n):
|
||||
cost[i][i] = freq[i]
|
||||
|
||||
for L in range(2, n + 1):
|
||||
for i in range(n - L + 1):
|
||||
j = i + L - 1
|
||||
cost[i][j] = float('inf')
|
||||
total_freq = sum(freq[i:j+1])
|
||||
for r in range(i, j + 1):
|
||||
c = (cost[i][r - 1] if r > i else 0) + \
|
||||
(cost[r + 1][j] if r < j else 0) + \
|
||||
total_freq
|
||||
if c < cost[i][j]:
|
||||
cost[i][j] = c
|
||||
|
||||
return cost[0][n - 1]
|
||||
|
||||
keys = [10, 12, 20]
|
||||
freq = [34, 8, 50]
|
||||
print(f"Cost of Optimal BST is {optimal_bst(keys, freq)}.")
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
Cost of Optimal BST is 142.
|
||||
```
|
||||
|
||||
### Complexity Analysis
|
||||
- **Time Complexity**: O(n^3) where n is the number of keys in the binary search tree.
|
||||
- **Space Complexity**: O(n^2) for both top-down and bottom-up approaches
|
||||
|
|
Po Szerokość: | Wysokość: | Rozmiar: 15 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 53 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 51 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 51 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 31 KiB |
|
@ -22,4 +22,5 @@
|
|||
- [AVL Trees](avl-trees.md)
|
||||
- [Splay Trees](splay-trees.md)
|
||||
- [Dijkstra's Algorithm](dijkstra.md)
|
||||
- [Deque](deque.md)
|
||||
- [Deque](deque.md)
|
||||
- [Tree Traversals](tree-traversal.md)
|
||||
|
|
|
@ -561,3 +561,58 @@ print("Sorted string:", sorted_str)
|
|||
### Complexity Analysis
|
||||
- **Time Complexity:** O(n+k) for all cases.No matter how the elements are placed in the array, the algorithm goes through n+k times
|
||||
- **Space Complexity:** O(max). Larger the range of elements, larger is the space complexity.
|
||||
|
||||
|
||||
## 9. Cyclic Sort
|
||||
|
||||
### Theory
|
||||
Cyclic Sort is an in-place sorting algorithm that is useful for sorting arrays where the elements are in a known range (e.g., 1 to N). The key idea behind the algorithm is that each number should be placed at its correct index. If we find a number that is not at its correct index, we swap it with the number at its correct index. This process is repeated until every number is at its correct index.
|
||||
|
||||
### Algorithm
|
||||
- Iterate over the array from the start to the end.
|
||||
- For each element, check if it is at its correct index.
|
||||
- If it is not at its correct index, swap it with the element at its correct index.
|
||||
- Continue this process until the element at the current index is in its correct position. Move to the next index and repeat the process until the end of the array is reached.
|
||||
|
||||
### Steps
|
||||
- Start with the first element.
|
||||
- Check if it is at the correct index (i.e., if arr[i] == i + 1).
|
||||
- If it is not, swap it with the element at the index arr[i] - 1.
|
||||
- Repeat step 2 for the current element until it is at the correct index.
|
||||
- Move to the next element and repeat the process.
|
||||
|
||||
### Code
|
||||
|
||||
```python
|
||||
def cyclic_sort(nums):
|
||||
i = 0
|
||||
while i < len(nums):
|
||||
correct_index = nums[i] - 1
|
||||
if nums[i] != nums[correct_index]:
|
||||
nums[i], nums[correct_index] = nums[correct_index], nums[i] # Swap
|
||||
else:
|
||||
i += 1
|
||||
return nums
|
||||
```
|
||||
|
||||
### Example
|
||||
```
|
||||
arr = [3, 1, 5, 4, 2]
|
||||
sorted_arr = cyclic_sort(arr)
|
||||
print(sorted_arr)
|
||||
```
|
||||
### Output
|
||||
```
|
||||
[1, 2, 3, 4, 5]
|
||||
```
|
||||
|
||||
### Complexity Analysis
|
||||
**Time Complexity:**
|
||||
|
||||
The time complexity of Cyclic Sort is **O(n)**.
|
||||
This is because in each cycle, each element is either placed in its correct position or a swap is made. Since each element is swapped at most once, the total number of swaps (and hence the total number of operations) is linear in the number of elements.
|
||||
|
||||
**Space Complexity:**
|
||||
|
||||
The space complexity of Cyclic Sort is **O(1)**.
|
||||
This is because the algorithm only requires a constant amount of additional space beyond the input array.
|
|
@ -0,0 +1,195 @@
|
|||
# Tree Traversal Algorithms
|
||||
|
||||
Tree Traversal refers to the process of visiting or accessing each node of the tree exactly once in a certain order. Tree traversal algorithms help us to visit and process all the nodes of the tree. Since tree is not a linear data structure, there are multiple nodes which we can visit after visiting a certain node. There are multiple tree traversal techniques which decide the order in which the nodes of the tree are to be visited.
|
||||
|
||||
|
||||
A Tree Data Structure can be traversed in following ways:
|
||||
- **Level Order Traversal or Breadth First Search or BFS**
|
||||
- **Depth First Search or DFS**
|
||||
- Inorder Traversal
|
||||
- Preorder Traversal
|
||||
- Postorder Traversal
|
||||
|
||||

|
||||
|
||||
|
||||
|
||||
## Binary Tree Structure
|
||||
Before diving into traversal techniques, let's define a simple binary tree node structure:
|
||||
|
||||
)
|
||||
|
||||
```python
|
||||
class Node:
|
||||
def __init__(self, key):
|
||||
self.leftChild = None
|
||||
self.rightChild = None
|
||||
self.data = key
|
||||
|
||||
# Main class
|
||||
if __name__ == "__main__":
|
||||
root = Node(1)
|
||||
root.leftChild = Node(2)
|
||||
root.rightChild = Node(3)
|
||||
root.leftChild.leftChild = Node(4)
|
||||
root.leftChild.rightChild = Node(5)
|
||||
root.rightChild.leftChild = Node(6)
|
||||
root.rightChild.rightChild = Node(6)
|
||||
```
|
||||
|
||||
## Level Order Traversal
|
||||
When the nodes of the tree are wrapped in a level-wise mode from left to right, then it represents the level order traversal. We can use a queue data structure to execute a level order traversal.
|
||||
|
||||
### Algorithm
|
||||
- Create an empty queue Q
|
||||
- Enqueue the root node of the tree to Q
|
||||
- Loop while Q is not empty
|
||||
- Dequeue a node from Q and visit it
|
||||
- Enqueue the left child of the dequeued node if it exists
|
||||
- Enqueue the right child of the dequeued node if it exists
|
||||
|
||||
### code for level order traversal in python
|
||||
```python
|
||||
def printLevelOrder(root):
|
||||
if root is None:
|
||||
return
|
||||
|
||||
# Create an empty queue
|
||||
queue = []
|
||||
|
||||
# Enqueue Root and initialize height
|
||||
queue.append(root)
|
||||
|
||||
while(len(queue) > 0):
|
||||
|
||||
# Print front of queue and
|
||||
# remove it from queue
|
||||
print(queue[0].data, end=" ")
|
||||
node = queue.pop(0)
|
||||
|
||||
# Enqueue left child
|
||||
if node.left is not None:
|
||||
queue.append(node.left)
|
||||
|
||||
# Enqueue right child
|
||||
if node.right is not None:
|
||||
queue.append(node.right)
|
||||
```
|
||||
|
||||
**output**
|
||||
|
||||
` Inorder traversal of binary tree is :
|
||||
1 2 3 4 5 6 7 `
|
||||
|
||||
|
||||
|
||||
## Depth First Search
|
||||
When we do a depth-first traversal, we travel in one direction up to the bottom first, then turn around and go the other way. There are three kinds of depth-first traversals.
|
||||
|
||||
## 1. Inorder Traversal
|
||||
|
||||
In this traversal method, the left subtree is visited first, then the root and later the right sub-tree. We should always remember that every node may represent a subtree itself.
|
||||
|
||||
`Note :` If a binary search tree is traversed in-order, the output will produce sorted key values in an ascending order.
|
||||
|
||||

|
||||
|
||||
**The order:** Left -> Root -> Right
|
||||
|
||||
### Algorithm
|
||||
- Traverse the left subtree.
|
||||
- Visit the root node.
|
||||
- Traverse the right subtree.
|
||||
|
||||
### code for inorder traversal in python
|
||||
```python
|
||||
def printInorder(root):
|
||||
if root:
|
||||
# First recur on left child
|
||||
printInorder(root.left)
|
||||
|
||||
# Then print the data of node
|
||||
print(root.val, end=" "),
|
||||
|
||||
# Now recur on right child
|
||||
printInorder(root.right)
|
||||
```
|
||||
|
||||
**output**
|
||||
|
||||
` Inorder traversal of binary tree is :
|
||||
4 2 5 1 6 3 7 `
|
||||
|
||||
|
||||
## 2. Preorder Traversal
|
||||
|
||||
In this traversal method, the root node is visited first, then the left subtree and finally the right subtree.
|
||||
|
||||
)
|
||||
|
||||
**The order:** Root -> Left -> Right
|
||||
|
||||
### Algorithm
|
||||
- Visit the root node.
|
||||
- Traverse the left subtree.
|
||||
- Traverse the right subtree.
|
||||
|
||||
### code for preorder traversal in python
|
||||
```python
|
||||
def printPreorder(root):
|
||||
if root:
|
||||
# First print the data of node
|
||||
print(root.val, end=" "),
|
||||
|
||||
# Then recur on left child
|
||||
printPreorder(root.left)
|
||||
|
||||
# Finally recur on right child
|
||||
printPreorder(root.right)
|
||||
```
|
||||
|
||||
**output**
|
||||
|
||||
` Inorder traversal of binary tree is :
|
||||
1 2 4 5 3 6 7 `
|
||||
|
||||
## 3. Postorder Traversal
|
||||
|
||||
In this traversal method, the root node is visited last, hence the name. First we traverse the left subtree, then the right subtree and finally the root node.
|
||||
|
||||

|
||||
|
||||
**The order:** Left -> Right -> Root
|
||||
|
||||
### Algorithm
|
||||
- Traverse the left subtree.
|
||||
- Traverse the right subtree.
|
||||
- Visit the root node.
|
||||
|
||||
### code for postorder traversal in python
|
||||
```python
|
||||
def printPostorder(root):
|
||||
if root:
|
||||
# First recur on left child
|
||||
printPostorder(root.left)
|
||||
|
||||
# The recur on right child
|
||||
printPostorder(root.right)
|
||||
|
||||
# Now print the data of node
|
||||
print(root.val, end=" ")
|
||||
```
|
||||
|
||||
**output**
|
||||
|
||||
` Inorder traversal of binary tree is :
|
||||
4 5 2 6 7 3 1 `
|
||||
|
||||
|
||||
## Complexity Analysis
|
||||
- **Time Complexity:** All three tree traversal methods (Inorder, Preorder, and Postorder) have a time complexity of `𝑂(𝑛)`, where 𝑛 is the number of nodes in the tree.
|
||||
- **Space Complexity:** The space complexity is influenced by the recursion stack. In the worst case, the depth of the recursion stack can go up to `𝑂(ℎ)`, where ℎ is the height of the tree.
|
||||
|
||||
|
||||
|
||||
|
Po Szerokość: | Wysokość: | Rozmiar: 16 KiB |
|
@ -2,16 +2,13 @@
|
|||
|
||||
- [Introduction to scikit-learn](sklearn-introduction.md)
|
||||
- [Binomial Distribution](binomial-distribution.md)
|
||||
- [Naive Bayes](naive-bayes.md)
|
||||
- [Regression in Machine Learning](regression.md)
|
||||
- [Polynomial Regression](polynomial-regression.md)
|
||||
- [Confusion Matrix](confusion-matrix.md)
|
||||
- [Decision Tree Learning](decision-tree.md)
|
||||
- [Random Forest](random-forest.md)
|
||||
- [Support Vector Machine Algorithm](support-vector-machine.md)
|
||||
- [Artificial Neural Network from the Ground Up](ann.md)
|
||||
- [Introduction To Convolutional Neural Networks (CNNs)](intro-to-cnn.md)
|
||||
- [TensorFlow.md](tensorflow.md)
|
||||
- [PyTorch.md](pytorch.md)
|
||||
- [Ensemble Learning](ensemble-learning.md)
|
||||
- [Types of optimizers](types-of-optimizers.md)
|
||||
- [Logistic Regression](logistic-regression.md)
|
||||
|
@ -19,9 +16,14 @@
|
|||
- [Clustering](clustering.md)
|
||||
- [Hierarchical Clustering](hierarchical-clustering.md)
|
||||
- [Grid Search](grid-search.md)
|
||||
- [Transformers](transformers.md)
|
||||
- [K-Means](kmeans.md)
|
||||
- [K-nearest neighbor (KNN)](knn.md)
|
||||
- [Naive Bayes](naive-bayes.md)
|
||||
- [Neural network regression](neural-network-regression.md)
|
||||
- [Xgboost](xgboost.md)
|
||||
- [Artificial Neural Network from the Ground Up](ann.md)
|
||||
- [Introduction To Convolutional Neural Networks (CNNs)](intro-to-cnn.md)
|
||||
- [TensorFlow](tensorflow.md)
|
||||
- [PyTorch](pytorch.md)
|
||||
- [PyTorch Fundamentals](pytorch-fundamentals.md)
|
||||
- [Transformers](transformers.md)
|
||||
- [Reinforcement Learning](reinforcement-learning.md)
|
||||
- [Neural network regression](neural-network-regression.md)
|
||||
|
|
|
@ -0,0 +1,233 @@
|
|||
# Reinforcement Learning: A Comprehensive Guide
|
||||
|
||||
Reinforcement Learning (RL) is a field of Machine Learing which focuses on goal-directed learning from interaction with the environment. In RL, an agent learns to make decisions by performing actions in an environment to maximize cumulative numerical reward signal. This README aims to provide a thorough understanding of RL, covering key concepts, algorithms, applications, and resources.
|
||||
|
||||
## What is Reinforcement Learning?
|
||||
|
||||
Reinforcement learning involves determining the best actions to take in various situations to maximize a numerical reward signal. Instead of being instructed on which actions to take, the learner must explore and identify the actions that lead to the highest rewards through trial and error. After each action performed in its environment, a trainer may give feedback in the form of rewards or penalties to indicate the desirability of the resulting state. Unlike supervised learning, reinforcement learning does not depend on labeled data but instead learns from the outcomes of its actions.
|
||||
|
||||
## Key Concepts and Terminology
|
||||
|
||||
### Agent
|
||||
Agent is a system or entity that learns to make decisions by interacting with an environment. The agent improves its performance by trial and error, receiving feedback from the environment in the form of rewards or punishments.
|
||||
|
||||
### Environment
|
||||
Environment is the setting or world in which the agent operates and interacts with. It provides the agent with states and feedback based on the agent's actions.
|
||||
|
||||
### State
|
||||
State represents the current situation of the environment, encapsulating all the relevant information needed for decision-making.
|
||||
|
||||
### Action
|
||||
Action represents a move that can be taken by the agent, which would affect the state of the environment. The set of all possible actions is called the action space.
|
||||
|
||||
### Reward
|
||||
Reward is the feedback from the environment in response to the agent’s action, thereby defining what are good and bad actions. Agent aims to maximize the total reward over time.
|
||||
|
||||
### Policy
|
||||
Policy is a strategy used by the agent to determine its actions based on the current state. In some cases the policy may be a simple function or lookup table, whereas in others it may involve extensive computation such as a search process.
|
||||
|
||||
### Value Function
|
||||
The value function of a state is the expected total amount of reward an agent can expect to accumulate over the future, starting from that state. There are two main types of value functions:
|
||||
- **State Value Function (V)**: The expected reward starting from a state and following a certain policy thereafter.
|
||||
- **Action Value Function (Q)**: The expected reward starting from a state, taking a specific action, and following a certain policy thereafter.
|
||||
|
||||
### Model
|
||||
Model mimics the behavior of the environment, or more generally, that allows inferences to be made about how the environment will behave.
|
||||
|
||||
### Exploration vs. Exploitation
|
||||
To accumulate substantial rewards, a reinforcement learning agent needs to favor actions that have previously yielded high rewards. However, to identify these effective actions, the agent must also attempt actions it hasn't tried before. This means the agent must *exploit* its past experiences to gain rewards, while also *exploring* new actions to improve its future decision-making.
|
||||
|
||||
## Types of Reinforcement Learning
|
||||
|
||||
### Model-Based vs Model-Free
|
||||
|
||||
**Model-Based Reinforcement Learning:** Model-based methods involve creating a model of the environment to predict future states and rewards, allowing the agent to plan its actions by simulating various scenarios. These methods often involve two main components:
|
||||
|
||||
**Model-Free Reinforcement Learning:** Model-free methods do not explicitly learn a model of the environment. Instead, they learn a policy or value function directly from the interactions with the environment. These methods can be further divided into two categories: value-based and policy-based methods.
|
||||
|
||||
### Value-Based Methods:
|
||||
Value-based methods focus on estimating the value function, and the policy is indirectly derived from the value function.
|
||||
|
||||
### Policy-Based Methods:
|
||||
Policy-based methods directly optimize the policy by maximizing the expected cumulative rewardto find the optimal parameters.
|
||||
|
||||
### Actor-Critic Methods:
|
||||
Actor-Critic methods combine the strengths of both value-based and policy-based methods. Actor learns the policy that maps states to actions and Critic learns the value function that evaluates the action chosen by the actor.
|
||||
|
||||
## Important Algorithms
|
||||
|
||||
### Q-Learning
|
||||
Q-Learning is a model-free algorithm used in reinforcement learning to learn the value of an action in a particular state. It aims to find the optimal policy by iteratively updating the Q-values, which represent the expected cumulative reward of taking a particular action in a given state and following the optimal policy thereafter.
|
||||
|
||||
#### Algorithm:
|
||||
1. Initialize Q-values arbitrarily for all state-action pairs.
|
||||
2. Repeat for each episode:
|
||||
- Choose an action using an exploration strategy (e.g., epsilon-greedy).
|
||||
- Take the action, observe the reward and the next state.
|
||||
- Update the Q-value of the current state-action pair using the Bellman equation:
|
||||
$$Q(s, a) \leftarrow Q(s, a) + \alpha \left( r + \gamma \max_{a'} Q(s', a') - Q(s, a) \right)$$
|
||||
|
||||
where:
|
||||
- $Q(s, a)$ is the Q-value of state $s$ and action $a$.
|
||||
- $r$ is the observed reward.
|
||||
- $s'$ is the next state.
|
||||
- $\alpha$ is the learning rate.
|
||||
- $\gamma$ is the discount factor.
|
||||
3. Until convergence or a maximum number of episodes.
|
||||
|
||||
### SARSA
|
||||
SARSA (State-Action-Reward-State-Action) is an on-policy temporal difference algorithm used for learning the Q-function. Unlike Q-learning, SARSA directly updates the Q-values based on the current policy.
|
||||
|
||||
#### Algorithm:
|
||||
1. Initialize Q-values arbitrarily for all state-action pairs.
|
||||
2. Repeat for each episode:
|
||||
- Initialize the environment state $s$.
|
||||
- Choose an action $a$ using the current policy (e.g., epsilon-greedy).
|
||||
- Repeat for each timestep:
|
||||
- Take action $a$, observe the reward $r$ and the next state $s'$.
|
||||
- Choose the next action $a'$ using the current policy.
|
||||
- Update the Q-value of the current state-action pair using the SARSA update rule:
|
||||
$$Q(s, a) \leftarrow Q(s, a) + \alpha \left( r + \gamma Q(s', a') - Q(s, a) \right)$$
|
||||
3. Until convergence or a maximum number of episodes.
|
||||
|
||||
### REINFORCE Algorithm:
|
||||
REINFORCE (Monte Carlo policy gradient) is a simple policy gradient method that updates the policy parameters in the direction of the gradient of expected rewards.
|
||||
|
||||
### Proximal Policy Optimization (PPO):
|
||||
PPO is an advanced policy gradient method that improves stability by limiting the policy updates within a certain trust region.
|
||||
|
||||
### A2C/A3C:
|
||||
Advantage Actor-Critic (A2C) and Asynchronous Advantage Actor-Critic (A3C) are variants of actor-critic methods that utilize multiple parallel agents to improve sample efficiency.
|
||||
|
||||
## Mathematical Background
|
||||
|
||||
### Markov Decision Processes (MDPs)
|
||||
A Markov Decision Process (MDP) is a mathematical framework used to model decision-making problems. It consists of states, actions, rewards and transition probabilities.
|
||||
|
||||
### Bellman Equations
|
||||
Bellman equations are fundamental recursive equations in dynamic programming and reinforcement learning. They express the value of a decision at one point in time in terms of the expected value of the subsequent decisions.
|
||||
|
||||
## Applications of Reinforcement Learning
|
||||
|
||||
### Gaming
|
||||
Reinforcement learning is extensively used in gaming for developing AI agents capable of playing complex games like AlphaGo, Chess, and video games. RL algorithms enable these agents to learn optimal strategies by interacting with the game environment and receiving feedback in the form of rewards.
|
||||
|
||||
### Robotics
|
||||
In robotics, reinforcement learning is employed to teach robots various tasks such as navigation, manipulation, and control. RL algorithms allow robots to learn from their interactions with the environment, enabling them to adapt and improve their behavior over time without explicit programming.
|
||||
|
||||
### Finance
|
||||
Reinforcement learning plays a crucial role in finance, particularly in algorithmic trading and portfolio management. RL algorithms are utilized to optimize trading strategies, automate decision-making processes, and manage investment portfolios dynamically based on changing market conditions and objectives.
|
||||
|
||||
### Healthcare
|
||||
In healthcare, reinforcement learning is utilized for various applications such as personalized treatment, drug discovery, and optimizing healthcare operations. RL algorithms can assist in developing personalized treatment plans for patients, identifying effective drug candidates, and optimizing resource allocation in hospitals to improve patient care and outcomes.
|
||||
|
||||
## Tools and Libraries
|
||||
- **OpenAI Gym:** A toolkit for developing and comparing RL algorithms.
|
||||
- **TensorFlow/TF-Agents:** A library for RL in TensorFlow.
|
||||
- **PyTorch:** Popular machine learning library with RL capabilities.
|
||||
- **Stable Baselines3:** A set of reliable implementations of RL algorithms in PyTorch.
|
||||
|
||||
## How to Start with Reinforcement Learning
|
||||
|
||||
### Prerequisites
|
||||
- Basic knowledge of machine learning and neural networks.
|
||||
- Proficiency in Python.
|
||||
|
||||
### Beginner Project
|
||||
The provided Python code implements the Q-learning algorithm for a basic grid world environment. It defines the grid world, actions, and parameters such as discount factor and learning rate. The algorithm iteratively learns the optimal action-value function (Q-values) by updating them based on rewards obtained from actions taken in each state. Finally, the learned Q-values are printed for each state-action pair.
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
|
||||
# Define the grid world environment
|
||||
# 'S' represents the start state
|
||||
# 'G' represents the goal state
|
||||
# 'H' represents the hole (negative reward)
|
||||
# '.' represents empty cells (neutral reward)
|
||||
# 'W' represents walls (impassable)
|
||||
grid_world = np.array([
|
||||
['S', '.', '.', '.', '.'],
|
||||
['.', 'W', '.', 'H', '.'],
|
||||
['.', '.', '.', 'W', '.'],
|
||||
['.', 'W', '.', '.', 'G']
|
||||
])
|
||||
|
||||
# Define the actions (up, down, left, right)
|
||||
actions = ['UP', 'DOWN', 'LEFT', 'RIGHT']
|
||||
|
||||
# Define parameters
|
||||
gamma = 0.9 # discount factor
|
||||
alpha = 0.1 # learning rate
|
||||
epsilon = 0.1 # exploration rate
|
||||
|
||||
# Initialize Q-values
|
||||
num_rows, num_cols = grid_world.shape
|
||||
num_actions = len(actions)
|
||||
Q = np.zeros((num_rows, num_cols, num_actions))
|
||||
|
||||
# Define helper function to get possible actions in a state
|
||||
def possible_actions(state):
|
||||
row, col = state
|
||||
possible_actions = []
|
||||
for i, action in enumerate(actions):
|
||||
if action == 'UP' and row > 0 and grid_world[row - 1, col] != 'W':
|
||||
possible_actions.append(i)
|
||||
elif action == 'DOWN' and row < num_rows - 1 and grid_world[row + 1, col] != 'W':
|
||||
possible_actions.append(i)
|
||||
elif action == 'LEFT' and col > 0 and grid_world[row, col - 1] != 'W':
|
||||
possible_actions.append(i)
|
||||
elif action == 'RIGHT' and col < num_cols - 1 and grid_world[row, col + 1] != 'W':
|
||||
possible_actions.append(i)
|
||||
return possible_actions
|
||||
|
||||
# Q-learning algorithm
|
||||
num_episodes = 1000
|
||||
for episode in range(num_episodes):
|
||||
# Initialize the starting state
|
||||
state = (0, 0) # start state
|
||||
while True:
|
||||
# Choose an action using epsilon-greedy policy
|
||||
if np.random.uniform(0, 1) < epsilon:
|
||||
action = np.random.choice(possible_actions(state))
|
||||
else:
|
||||
action = np.argmax(Q[state[0], state[1]])
|
||||
|
||||
# Perform the action and observe the next state and reward
|
||||
if actions[action] == 'UP':
|
||||
next_state = (state[0] - 1, state[1])
|
||||
elif actions[action] == 'DOWN':
|
||||
next_state = (state[0] + 1, state[1])
|
||||
elif actions[action] == 'LEFT':
|
||||
next_state = (state[0], state[1] - 1)
|
||||
elif actions[action] == 'RIGHT':
|
||||
next_state = (state[0], state[1] + 1)
|
||||
|
||||
# Get the reward
|
||||
if grid_world[next_state] == 'G':
|
||||
reward = 1 # goal state
|
||||
elif grid_world[next_state] == 'H':
|
||||
reward = -1 # hole
|
||||
else:
|
||||
reward = 0
|
||||
|
||||
# Update Q-value using the Bellman equation
|
||||
best_next_action = np.argmax(Q[next_state[0], next_state[1]])
|
||||
Q[state[0], state[1], action] += alpha * (
|
||||
reward + gamma * Q[next_state[0], next_state[1], best_next_action] - Q[state[0], state[1], action])
|
||||
|
||||
# Move to the next state
|
||||
state = next_state
|
||||
|
||||
# Check if the episode is terminated
|
||||
if grid_world[state] in ['G', 'H']:
|
||||
break
|
||||
|
||||
# Print the learned Q-values
|
||||
print("Learned Q-values:")
|
||||
for i in range(num_rows):
|
||||
for j in range(num_cols):
|
||||
print(f"State ({i}, {j}):", Q[i, j])
|
||||
```
|
||||
|
||||
## Conclusion
|
||||
Congratulations on completing your journey through this comprehensive guide to reinforcement learning! Armed with this knowledge, you are well-equipped to dive deeper into the exciting world of RL, whether it's for gaming, robotics, finance, healthcare, or any other domain. Keep exploring, experimenting, and learning, and remember, the only limit to what you can achieve with reinforcement learning is your imagination.
|
|
@ -0,0 +1,92 @@
|
|||
# XGBoost
|
||||
XGBoost is an implementation of gradient boosted decision trees designed for speed and performance.
|
||||
|
||||
## Introduction to Gradient Boosting
|
||||
Gradient boosting is a powerful technique for building predictive models that has seen widespread success in various applications.
|
||||
- **Boosting Concept**: Boosting originated from the idea of modifying weak learners to improve their predictive capability.
|
||||
- **AdaBoost**: The first successful boosting algorithm was Adaptive Boosting (AdaBoost), which utilizes decision stumps as weak learners.
|
||||
- **Gradient Boosting Machines (GBM)**: AdaBoost and related algorithms were later reformulated as Gradient Boosting Machines, casting boosting as a numerical optimization problem.
|
||||
- **Algorithm Elements**:
|
||||
- _Loss function_: Determines the objective to minimize (e.g., cross-entropy for classification, mean squared error for regression).
|
||||
- _Weak learner_: Typically, decision trees are used as weak learners.
|
||||
- _Additive model_: New weak learners are added iteratively to minimize the loss function, correcting the errors of previous models.
|
||||
|
||||
## Introduction to XGBoost
|
||||
- eXtreme Gradient Boosting (XBGoost): a more **regularized form** of Gradient Boosting, as it uses **advanced regularization (L1&L2)**, improving the model’s **generalization capabilities.**
|
||||
- It’s suitable when there is **a large number of training samples and a small number of features**; or when there is **a mixture of categorical and numerical features**.
|
||||
- **Development**: Created by Tianqi Chen, XGBoost is designed for computational speed and model performance.
|
||||
- **Key Features**:
|
||||
- _Speed_: Achieved through careful engineering, including parallelization of tree construction, distributed computing, and cache optimization.
|
||||
- _Support for Variations_: XGBoost supports various techniques and optimizations.
|
||||
- _Out-of-Core Computing_: Can handle very large datasets that don't fit into memory.
|
||||
- **Advantages**:
|
||||
- _Sparse Optimization_: Suitable for datasets with many zero values.
|
||||
- _Regularization_: Implements advanced regularization techniques (L1 and L2), enhancing generalization capabilities.
|
||||
- _Parallel Training_: Utilizes all CPU cores during training for faster processing.
|
||||
- _Multiple Loss Functions_: Supports different loss functions based on the problem type.
|
||||
- _Bagging and Early Stopping_: Additional techniques for improving performance and efficiency.
|
||||
- **Pre-Sorted Decision Tree Algorithm**:
|
||||
1. Features are pre-sorted by their values.
|
||||
2. Traversing segmentation points involves finding the best split point on a feature with a cost of O(#data).
|
||||
3. Data is split into left and right child nodes after finding the split point.
|
||||
4. Pre-sorting allows for accurate split point determination.
|
||||
- **Limitations**:
|
||||
1. Iterative Traversal: Each iteration requires traversing the entire training data multiple times.
|
||||
2. Memory Consumption: Loading the entire training data into memory limits size, while not loading it leads to time-consuming read/write operations.
|
||||
3. Space Consumption: Pre-sorting consumes space, storing feature sorting results and split gain calculations.
|
||||
XGBoosting:
|
||||

|
||||
|
||||
## Develop Your First XGBoost Model
|
||||
This code uses the XGBoost library to train a model on the Iris dataset, splitting the data, setting hyperparameters, training the model, making predictions, and evaluating accuracy, achieving an accuracy score of X on the testing set.
|
||||
|
||||
```python
|
||||
# XGBoost with Iris Dataset
|
||||
# Importing necessary libraries
|
||||
import numpy as np
|
||||
import xgboost as xgb
|
||||
from sklearn.datasets import load_iris
|
||||
from sklearn.model_selection import train_test_split
|
||||
from sklearn.metrics import accuracy_score
|
||||
|
||||
# Loading a sample dataset (Iris dataset)
|
||||
data = load_iris()
|
||||
X = data.data
|
||||
y = data.target
|
||||
|
||||
# Splitting the dataset into training and testing sets
|
||||
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
|
||||
|
||||
# Converting the dataset into DMatrix format
|
||||
dtrain = xgb.DMatrix(X_train, label=y_train)
|
||||
dtest = xgb.DMatrix(X_test, label=y_test)
|
||||
|
||||
# Setting hyperparameters for XGBoost
|
||||
params = {
|
||||
'max_depth': 3,
|
||||
'eta': 0.1,
|
||||
'objective': 'multi:softmax',
|
||||
'num_class': 3
|
||||
}
|
||||
|
||||
# Training the XGBoost model
|
||||
num_round = 50
|
||||
model = xgb.train(params, dtrain, num_round)
|
||||
|
||||
# Making predictions on the testing set
|
||||
y_pred = model.predict(dtest)
|
||||
|
||||
# Evaluating the model
|
||||
accuracy = accuracy_score(y_test, y_pred)
|
||||
print("Accuracy:", accuracy)
|
||||
```
|
||||
|
||||
### Output
|
||||
|
||||
Accuracy: 1.0
|
||||
|
||||
## **Conclusion**
|
||||
XGBoost's focus on speed, performance, and scalability has made it one of the most widely used and powerful predictive modeling algorithms available. Its ability to handle large datasets efficiently, along with its advanced features and optimizations, makes it a valuable tool in machine learning and data science.
|
||||
|
||||
## Reference
|
||||
- [Machine Learning Prediction of Turning Precision Using Optimized XGBoost Model](https://www.mdpi.com/2076-3417/12/15/7739)
|
|
@ -9,3 +9,4 @@
|
|||
- [Working with Date & Time in Pandas](datetime.md)
|
||||
- [Importing and Exporting Data in Pandas](import-export.md)
|
||||
- [Handling Missing Values in Pandas](handling-missing-values.md)
|
||||
- [Pandas Series](pandas-series.md)
|
||||
|
|
|
@ -0,0 +1,317 @@
|
|||
# Pandas Series
|
||||
|
||||
A series is a Panda data structures that represents a one dimensional array-like object containing an array of data and an associated array of data type labels, called index.
|
||||
|
||||
## Creating a Series object:
|
||||
|
||||
### Basic Series
|
||||
To create a basic Series, you can pass a list or array of data to the `pd.Series()` function.
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
s1 = pd.Series([4, 5, 2, 3])
|
||||
print(s1)
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
0 4
|
||||
1 5
|
||||
2 2
|
||||
3 3
|
||||
dtype: int64
|
||||
```
|
||||
|
||||
### Series from a Dictionary
|
||||
|
||||
If you pass a dictionary to `pd.Series()`, the keys become the index and the values become the data of the Series.
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
s2 = pd.Series({'A': 1, 'B': 2, 'C': 3})
|
||||
print(s2)
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
A 1
|
||||
B 2
|
||||
C 3
|
||||
dtype: int64
|
||||
```
|
||||
|
||||
|
||||
## Additional Functionality
|
||||
|
||||
|
||||
### Specifying Data Type and Index
|
||||
You can specify the data type and index while creating a Series.
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
s4 = pd.Series([1, 2, 3], index=['a', 'b', 'c'], dtype='float64')
|
||||
print(s4)
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
a 1.0
|
||||
b 2.0
|
||||
c 3.0
|
||||
dtype: float64
|
||||
```
|
||||
|
||||
### Specifying NaN Values:
|
||||
* Sometimes you need to create a series object of a certain size but you do not have complete data available so in such cases you can fill missing data with a NaN(Not a Number) value.
|
||||
* When you store NaN value in series object, the data type must be floating pont type. Even if you specify an integer type , pandas will promote it to floating point type automatically because NaN is not supported by integer type.
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
s3=pd.Series([1,np.Nan,2])
|
||||
print(s3)
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
0 1.0
|
||||
1 NaN
|
||||
2 2.0
|
||||
dtype: float64
|
||||
```
|
||||
|
||||
|
||||
### Creating Data from Expressions
|
||||
You can create a Series using an expression or function.
|
||||
|
||||
`<series_object>`=np.Series(data=<function|expression>,index=None)
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
a=np.arange(1,5) # [1,2,3,4]
|
||||
s5=pd.Series(data=a**2,index=a)
|
||||
print(s5)
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
1 1
|
||||
2 4
|
||||
3 9
|
||||
4 16
|
||||
dtype: int64
|
||||
```
|
||||
|
||||
## Series Object Attributes
|
||||
|
||||
| **Attribute** | **Description** |
|
||||
|--------------------------|---------------------------------------------------|
|
||||
| `<series>.index` | Array of index of the Series |
|
||||
| `<series>.values` | Array of values of the Series |
|
||||
| `<series>.dtype` | Return the dtype of the data |
|
||||
| `<series>.shape` | Return a tuple representing the shape of the data |
|
||||
| `<series>.ndim` | Return the number of dimensions of the data |
|
||||
| `<series>.size` | Return the number of elements in the data |
|
||||
| `<series>.hasnans` | Return True if there is any NaN in the data |
|
||||
| `<series>.empty` | Return True if the Series object is empty |
|
||||
|
||||
- If you use len() on a series object then it return total number of elements in the series object whereas <series_object>.count() return only the number of non NaN elements.
|
||||
|
||||
## Accessing a Series object and its elements
|
||||
|
||||
### Accessing Individual Elements
|
||||
You can access individual elements using their index.
|
||||
'legal' indexes arte used to access individual element.
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
s7 = pd.Series(data=[13, 45, 67, 89], index=['A', 'B', 'C', 'D'])
|
||||
print(s7['A'])
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
13
|
||||
```
|
||||
|
||||
### Slicing a Series
|
||||
|
||||
- Slices are extracted based on their positional index, regardless of the custom index labels.
|
||||
- Each element in the Series has a positional index starting from 0 (i.e., 0 for the first element, 1 for the second element, and so on).
|
||||
- `<series>[<start>:<end>]` will return the values of the elements between the start and end positions (excluding the end position).
|
||||
|
||||
#### Example
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
s = pd.Series(data=[13, 45, 67, 89], index=['A', 'B', 'C', 'D'])
|
||||
print(s[:2])
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
A 13
|
||||
B 45
|
||||
dtype: int64
|
||||
```
|
||||
|
||||
This example demonstrates that the first two elements (positions 0 and 1) are returned, regardless of their custom index labels.
|
||||
|
||||
## Operation on series object
|
||||
|
||||
### Modifying elements and indexes
|
||||
* <series_object>[indexes]=< new data value >
|
||||
* <series_object>[start : end]=< new data value >
|
||||
* <series_object>.index=[new indexes]
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
s8 = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
|
||||
s8['a'] = 100
|
||||
s8.index = ['x', 'y', 'z']
|
||||
print(s8)
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
x 100
|
||||
y 20
|
||||
z 30
|
||||
dtype: int64
|
||||
```
|
||||
|
||||
**Note: Series object are value-mutable but size immutable objects.**
|
||||
|
||||
### Vector operations
|
||||
We can perform vector operations such as `+`,`-`,`/`,`%` etc.
|
||||
|
||||
#### Addition
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
s9 = pd.Series([1, 2, 3])
|
||||
print(s9 + 5)
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
0 6
|
||||
1 7
|
||||
2 8
|
||||
dtype: int64
|
||||
```
|
||||
|
||||
#### Subtraction
|
||||
```python
|
||||
print(s9 - 2)
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
0 -1
|
||||
1 0
|
||||
2 1
|
||||
dtype: int64
|
||||
```
|
||||
|
||||
### Arthmetic on series object
|
||||
|
||||
#### Addition
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
s10 = pd.Series([1, 2, 3])
|
||||
s11 = pd.Series([4, 5, 6])
|
||||
print(s10 + s11)
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
0 5
|
||||
1 7
|
||||
2 9
|
||||
dtype: int64
|
||||
```
|
||||
|
||||
#### Multiplication
|
||||
|
||||
```python
|
||||
print("s10 * s11)
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
0 4
|
||||
1 10
|
||||
2 18
|
||||
dtype: int64
|
||||
```
|
||||
|
||||
Here one thing we should keep in mind that both the series object should have same indexes otherwise it will return NaN value to all the indexes of two series object .
|
||||
|
||||
|
||||
### Head and Tail Functions
|
||||
|
||||
| **Functions** | **Description** |
|
||||
|--------------------------|---------------------------------------------------|
|
||||
| `<series>.head(n)` | return the first n elements of the series |
|
||||
| `<series>.tail(n)` | return the last n elements of the series |
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
s12 = pd.Series([10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
|
||||
print(s12.head(3))
|
||||
print(s12.tail(3))
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
0 10
|
||||
1 20
|
||||
2 30
|
||||
dtype: int64
|
||||
7 80
|
||||
8 90
|
||||
9 100
|
||||
dtype: int64
|
||||
```
|
||||
|
||||
If you dont provide any value to n the by default it give results for `n=5`.
|
||||
|
||||
### Few extra functions
|
||||
|
||||
| **Function** | **Description** |
|
||||
|----------------------------------------|------------------------------------------------------------------------|
|
||||
| `<series_object>.sort_values()` | Return the Series object in ascending order based on its values. |
|
||||
| `<series_object>.sort_index()` | Return the Series object in ascending order based on its index. |
|
||||
| `<series_object>.sort_drop(<index>)` | Return the Series with the deleted index and its corresponding value. |
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
s13 = pd.Series([3, 1, 2], index=['c', 'a', 'b'])
|
||||
print(s13.sort_values())
|
||||
print(s13.sort_index())
|
||||
print(s13.drop('a'))
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
a 1
|
||||
b 2
|
||||
c 3
|
||||
dtype: int64
|
||||
a 1
|
||||
b 2
|
||||
c 3
|
||||
dtype: int64
|
||||
c 3
|
||||
b 2
|
||||
dtype: int64
|
||||
```
|
||||
|
||||
## Conclusion
|
||||
In short, Pandas Series is a fundamental data structure in Python for handling one-dimensional data. It combines an array of values with an index, offering efficient methods for data manipulation and analysis. With its ease of use and powerful functionality, Pandas Series is widely used in data science and analytics for tasks such as data cleaning, exploration, and visualization.
|
Po Szerokość: | Wysokość: | Rozmiar: 28 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 28 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 27 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 32 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 52 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 29 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 27 KiB |
|
@ -10,3 +10,4 @@
|
|||
- [Seaborn Plotting Functions](seaborn-plotting.md)
|
||||
- [Getting started with Seaborn](seaborn-basics.md)
|
||||
- [Bar Plots in Plotly](plotly-bar-plots.md)
|
||||
- [Pie Charts in Plotly](plotly-pie-charts.md)
|
|
@ -0,0 +1,221 @@
|
|||
# Pie Charts in Plotly
|
||||
|
||||
A pie chart is a type of graph that represents the data in the circular graph. The slices of pie show the relative size of the data, and it is a type of pictorial representation of data. A pie chart requires a list of categorical variables and numerical variables. Here, the term "pie" represents the whole, and the "slices" represent the parts of the whole.
|
||||
|
||||
Pie charts are commonly used in business presentations like sales, operations, survey results, resources, etc. as they are pleasing to the eye and provide a quick summary.
|
||||
|
||||
Plotly is a very powerful library for creating modern visualizations and it provides a very easy and intuitive method to create highly customized pie charts.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Before creating bar plots in Plotly you must ensure that you have Python, Plotly and Pandas installed on your system.
|
||||
|
||||
## Introduction
|
||||
|
||||
There are various ways to create pie charts in `plotly`. One of the prominent and easiest one is using `plotly.express`. Plotly Express is the easy-to-use, high-level interface to Plotly, which operates on a variety of types of data and produces easy-to-style figures. On the other hand you can also use `plotly.graph_objects` to create various plots.
|
||||
|
||||
Here, we'll be using `plotly.express` to create the pie charts. Also we'll be converting our datasets into pandas DataFrames which makes it extremely convenient and easy to create charts.
|
||||
|
||||
Also, note that when you execute the codes in a simple python file, the output plot will be shown in your **browser**, rather than a pop-up window like in matplotlib. If you do not want that, it is **recommended to create the plots in a notebook (like jupyter)**. For this, install an additional library `nbformat`. This way you can see the output on the notebook itself, and can also render its format to png, jpg, etc.
|
||||
|
||||
## Creating a simple pie chart using `plotly.express.pie`
|
||||
|
||||
In `plotly.express.pie`, data visualized by the sectors of the pie is set in values. The sector labels are set in names.
|
||||
|
||||
```Python
|
||||
import plotly.express as px
|
||||
import pandas as pd
|
||||
|
||||
# Creating dataset
|
||||
flowers = ['Rose','Tulip','Marigold','Sunflower','Daffodil']
|
||||
petals = [11,9,17,4,7]
|
||||
|
||||
# Converting dataset to pandas DataFrame
|
||||
dataset = {'flowers':flowers, 'petals':petals}
|
||||
df = pd.DataFrame(dataset)
|
||||
|
||||
# Creating pie chart
|
||||
fig = px.pie(df, values='petals', names='flowers')
|
||||
|
||||
# Showing plot
|
||||
fig.show()
|
||||
```
|
||||

|
||||
|
||||
Here, we are first creating the dataset and converting it into Pandas DataFrames using dictionaries, with its keys being DataFrame columns. Next, we are plotting the pie chart by using `px.pie`. In the `values` and `names` parameters, we have to specify a column name in the DataFrame.
|
||||
|
||||
`px.pie(df, values='Petals', names='Flowers')` is used to specify that the pie chart is to be plotted by taking the values from column `Petals` and the fractional area of each slice is represented by **petal/sum(petals)**. The column `flowers` represents the labels of slices corresponding to each value in `petals`.
|
||||
|
||||
**Note:** When you generate the image using above code, it will show you an **interactive plot**, if you want image, you can download it from their itself.
|
||||
|
||||
## Customizing Pie Charts
|
||||
|
||||
### Adding title to the chart
|
||||
|
||||
Simply pass the title of your chart as a parameter in `px.pie`.
|
||||
|
||||
```Python
|
||||
import plotly.express as px
|
||||
import pandas as pd
|
||||
|
||||
# Creating dataset
|
||||
flowers = ['Rose','Tulip','Marigold','Sunflower','Daffodil']
|
||||
petals = [11,9,17,4,7]
|
||||
|
||||
# Converting dataset to pandas DataFrame
|
||||
dataset = {'flowers':flowers, 'petals':petals}
|
||||
df = pd.DataFrame(dataset)
|
||||
|
||||
# Creating pie chart
|
||||
fig = px.pie(df, values='petals', names='flowers',
|
||||
title='Number of Petals in Flowers')
|
||||
|
||||
# Showing plot
|
||||
fig.show()
|
||||
```
|
||||

|
||||
|
||||
### Coloring Slices
|
||||
|
||||
There are a lot of beautiful color scales available in plotly and can be found here [plotly color scales](https://plotly.com/python/builtin-colorscales/). Choose your favourite colorscale apply it like this:
|
||||
|
||||
```Python
|
||||
import plotly.express as px
|
||||
import pandas as pd
|
||||
|
||||
# Creating dataset
|
||||
flowers = ['Rose','Tulip','Marigold','Sunflower','Daffodil']
|
||||
petals = [11,9,17,4,7]
|
||||
|
||||
# Converting dataset to pandas DataFrame
|
||||
dataset = {'flowers':flowers, 'petals':petals}
|
||||
df = pd.DataFrame(dataset)
|
||||
|
||||
# Creating pie chart
|
||||
fig = px.pie(df, values='petals', names='flowers',
|
||||
title='Number of Petals in Flowers',
|
||||
color_discrete_sequence=px.colors.sequential.Agsunset)
|
||||
|
||||
# Showing plot
|
||||
fig.show()
|
||||
```
|
||||

|
||||
|
||||
You can also set custom colors for each label by passing it as a dictionary(map) in `color_discrete_map`, like this:
|
||||
|
||||
```Python
|
||||
import plotly.express as px
|
||||
import pandas as pd
|
||||
|
||||
# Creating dataset
|
||||
flowers = ['Rose','Tulip','Marigold','Sunflower','Daffodil']
|
||||
petals = [11,9,17,4,7]
|
||||
|
||||
# Converting dataset to pandas DataFrame
|
||||
dataset = {'flowers':flowers, 'petals':petals}
|
||||
df = pd.DataFrame(dataset)
|
||||
|
||||
# Creating pie chart
|
||||
fig = px.pie(df, values='petals', names='flowers',
|
||||
title='Number of Petals in Flowers',
|
||||
color='flowers',
|
||||
color_discrete_map={'Rose':'red',
|
||||
'Tulip':'magenta',
|
||||
'Marigold':'green',
|
||||
'Sunflower':'yellow',
|
||||
'Daffodil':'royalblue'})
|
||||
|
||||
# Showing plot
|
||||
fig.show()
|
||||
```
|
||||

|
||||
|
||||
### Labeling Slices
|
||||
|
||||
You can use `fig.update_traces` to effectively control the properties of text being displayed on your figure, for example if we want both flower name , petal count and percentage in our slices, we can do it like this:
|
||||
|
||||
```Python
|
||||
import plotly.express as px
|
||||
import pandas as pd
|
||||
|
||||
# Creating dataset
|
||||
flowers = ['Rose','Tulip','Marigold','Sunflower','Daffodil']
|
||||
petals = [11,9,17,4,7]
|
||||
|
||||
# Converting dataset to pandas DataFrame
|
||||
dataset = {'flowers':flowers, 'petals':petals}
|
||||
df = pd.DataFrame(dataset)
|
||||
|
||||
# Creating pie chart
|
||||
fig = px.pie(df, values='petals', names='flowers',
|
||||
title='Number of Petals in Flowers')
|
||||
|
||||
# Updating text properties
|
||||
fig.update_traces(textposition='inside', textinfo='label+value+percent')
|
||||
|
||||
# Showing plot
|
||||
fig.show()
|
||||
```
|
||||

|
||||
|
||||
### Pulling out a slice
|
||||
|
||||
To pull out a slice pass an array to parameter `pull` in `fig.update_traces` corresponding to the slices and amount to be pulled.
|
||||
|
||||
```Python
|
||||
import plotly.express as px
|
||||
import pandas as pd
|
||||
|
||||
# Creating dataset
|
||||
flowers = ['Rose','Tulip','Marigold','Sunflower','Daffodil']
|
||||
petals = [11,9,17,4,7]
|
||||
|
||||
# Converting dataset to pandas DataFrame
|
||||
dataset = {'flowers':flowers, 'petals':petals}
|
||||
df = pd.DataFrame(dataset)
|
||||
|
||||
# Creating pie chart
|
||||
fig = px.pie(df, values='petals', names='flowers',
|
||||
title='Number of Petals in Flowers')
|
||||
|
||||
# Updating text properties
|
||||
fig.update_traces(textposition='inside', textinfo='label+value')
|
||||
|
||||
# Pulling out slice
|
||||
fig.update_traces(pull=[0,0,0,0.2,0])
|
||||
|
||||
# Showing plot
|
||||
fig.show()
|
||||
```
|
||||

|
||||
|
||||
### Pattern Fills
|
||||
|
||||
You can also add patterns (hatches), in addition to colors in pie charts.
|
||||
|
||||
```Python
|
||||
import plotly.express as px
|
||||
import pandas as pd
|
||||
|
||||
# Creating dataset
|
||||
flowers = ['Rose','Tulip','Marigold','Sunflower','Daffodil']
|
||||
petals = [11,9,17,4,7]
|
||||
|
||||
# Converting dataset to pandas DataFrame
|
||||
dataset = {'flowers':flowers, 'petals':petals}
|
||||
df = pd.DataFrame(dataset)
|
||||
|
||||
# Creating pie chart
|
||||
fig = px.pie(df, values='petals', names='flowers',
|
||||
title='Number of Petals in Flowers')
|
||||
|
||||
# Updating text properties
|
||||
fig.update_traces(textposition='outside', textinfo='label+value')
|
||||
|
||||
# Adding pattern fills
|
||||
fig.update_traces(marker=dict(pattern=dict(shape=[".", "/", "+", "-","+"])))
|
||||
|
||||
# Showing plot
|
||||
fig.show()
|
||||
```
|
||||

|