diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 4a366da..8688009 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -24,8 +24,8 @@ The list of topics for which we are looking for content are provided below along - Web Scrapping - [Link](https://github.com/animator/learn-python/tree/main/contrib/web-scrapping) - API Development - [Link](https://github.com/animator/learn-python/tree/main/contrib/api-development) - Data Structures & Algorithms - [Link](https://github.com/animator/learn-python/tree/main/contrib/ds-algorithms) -- Python Mini Projects - [Link](https://github.com/animator/learn-python/tree/main/contrib/mini-projects) -- Python Question Bank - [Link](https://github.com/animator/learn-python/tree/main/contrib/question-bank) +- Python Mini Projects - [Link](https://github.com/animator/learn-python/tree/main/contrib/mini-projects) **(Not accepting)** +- Python Question Bank - [Link](https://github.com/animator/learn-python/tree/main/contrib/question-bank) **(Not accepting)** You can check out some content ideas below. diff --git a/contrib/advanced-python/exception-handling.md b/contrib/advanced-python/exception-handling.md new file mode 100644 index 0000000..3e0c672 --- /dev/null +++ b/contrib/advanced-python/exception-handling.md @@ -0,0 +1,192 @@ +# Exception Handling in Python + +Exception Handling is a way of managing the errors that may occur during a program execution. Python's exception handling mechanism has been designed to avoid the unexpected termination of the program, and offer to either regain control after an error or display a meaningful message to the user. + +- **Error** - An error is a mistake or an incorrect result produced by a program. It can be a syntax error, a logical error, or a runtime error. Errors are typically fatal, meaning they prevent the program from continuing to execute. +- **Exception** - An exception is an event that occurs during the execution of a program that disrupts the normal flow of instructions. Exceptions are typically unexpected and can be handled by the program to prevent it from crashing or terminating abnormally. It can be runtime, input/output or system exceptions. Exceptions are designed to be handled by the program, allowing it to recover from the error and continue executing. + +## Python Built-in Exceptions + +There are plenty of built-in exceptions in Python that are raised when a corresponding error occur. +We can view all the built-in exceptions using the built-in `local()` function as follows: + +```python +print(dir(locals()['__builtins__'])) +``` + +|**S.No**|**Exception**|**Description**| +|---|---|---| +|1|SyntaxError|A syntax error occurs when the code we write violates the grammatical rules such as misspelled keywords, missing colon, mismatched parentheses etc.| +|2|TypeError|A type error occurs when we try to perform an operation or use a function with objects that are of incompatible data types.| +|3|NameError|A name error occurs when we try to use a variable, function, module or string without quotes that hasn't been defined or isn't used in a valid way.| +|4|IndexError|A index error occurs when we try to access an element in a sequence (like a list, tuple or string) using an index that's outside the valid range of indices for that sequence.| +|5|KeyError|A key error occurs when we try to access a key that doesn't exist in a dictionary. Attempting to retrieve a value using a non-existent key results this error.| +|6|ValueError|A value error occurs when we provide an argument or value that's inappropriate for a specific operation or function such as doing mathematical operations with incompatible types (e.g., dividing a string by an integer.)| +|7|AttributeError|An attribute error occurs when we try to access an attribute (like a variable or method) on an object that doesn't possess that attribute.| +|8|IOError|An IO (Input/Output) error occurs when an operation involving file or device interaction fails. It signifies that there's an issue during communication between your program and the external system.| +|9|ZeroDivisionError|A ZeroDivisionError occurs when we attempt to divide a number by zero. This operation is mathematically undefined, and Python raises this error to prevent nonsensical results.| +|10|ImportError|An import error occurs when we try to use a module or library that Python can't find or import succesfully.| + +## Try and Except Statement - Catching Exception + +The `try-except` statement allows us to anticipate potential errors during program execution and define what actions to take when those errors occur. This prevents the program from crashing unexpectedly and makes it more robust. + +Here's an example to explain this: + +```python +try: + # Code that might raise an exception + result = 10 / 0 +except: + print("An error occured!") +``` + +Output + +```markdown +An error occured! +``` + +In this example, the `try` block contains the code that you suspect might raise an exception. Python attempts to execute the code within this block. If an exception occurs, Python jumps to the `except` block and executes the code within it. + +## Specific Exception Handling + +You can specify the type of expection you want to catch using the `except` keyword followed by the exception class name. You can also have multiple `except` blocks to handle different exception types. + +Here's an example: + +```python +try: + # Code that might raise ZeroDivisionError or NameError + result = 10 / 0 + name = undefined_variable +except ZeroDivisionError: + print("Oops! You tried to divide by zero.") +except NameError: + print("There's a variable named 'undefined_variable' that hasn't been defined yet.") +``` + +Output + +```markdown +Oops! You tried to divide by zero. +``` + +If you comment on the line `result = 10 / 0`, then the output will be: + +```markdown +There's a variable named 'undefined_variable' that hasn't been defined yet. +``` + +## Important Note + +In this code, the `except` block are specific to each type of expection. If you want to catch both exceptions with a single `except` block, you can use of tuple of exceptions, like this: + +```python +try: + # Code that might raise ZeroDivisionError or NameError + result = 10 / 0 + name = undefined_variable +except (ZeroDivisionError, NameError): + print("An error occured!") +``` + +Output + +```markdown +An error occured! +``` + +## Try with Else Clause + +The `else` clause in a Python `try-except` block provides a way to execute code only when the `try` block succeeds without raising any exceptions. It's like having a section of code that runs exclusively under the condition that no errors occur during the main operation in the `try` block. + +Here's an example to understand this: + +```python +def calculate_average(numbers): + if len(numbers) == 0: # Handle empty list case seperately (optional) + return None + try: + total = sum(numbers) + average = total / len(numbers) + except ZeroDivisionError: + print("Cannot calculate average for a list containing zero.") + else: + print("The average is:", average) + return average #Optionally return the average here + +# Example usage +numbers = [10, 20, 30] +result = calculate_average(numbers) + +if result is not None: # Check if result is available (handles empty list case) + print("Calculation succesfull!") +``` + +Output + +```markdown +The average is: 20.0 +``` + +## Finally Keyword in Python + +The `finally` keyword in Python is used within `try-except` statements to execute a block of code **always**, regardless of whether an exception occurs in the `try` block or not. + +To understand this, let us take an example: + +```python +try: + a = 10 // 0 + print(a) +except ZeroDivisionError: + print("Cannot be divided by zero.") +finally: + print("Program executed!") +``` + +Output + +```markdown +Cannot be divided by zero. +Program executed! +``` + +## Raise Keyword in Python + +In Python, raising an exception allows you to signal that an error condition has occured during your program's execution. The `raise` keyword is used to explicity raise an exception. + +Let us take an example: + +```python +def divide(x, y): + if y == 0: + raise ZeroDivisionError("Can't divide by zero!") # Raise an exception with a message + result = x / y + return result + +try: + division_result = divide(10, 0) + print("Result:", division_result) +except ZeroDivisionError as e: + print("An error occured:", e) # Handle the exception and print the message +``` + +Output + +```markdown +An error occured: Can't divide by zero! +``` + +## Advantages of Exception Handling + +- **Improved Error Handling** - It allows you to gracefully handle unexpected situations that arise during program execution. Instead of crashing abruptly, you can define specific actions to take when exceptions occur, providing a smoother experience. +- **Code Robustness** - Exception Handling helps you to write more resilient programs by anticipating potential issues and providing approriate responses. +- **Enhanced Code Readability** - By seperating error handling logic from the core program flow, your code becomes more readable and easier to understand. The `try-except` blocks clearly indicate where potential errors might occur and how they'll be addressed. + +## Disadvantages of Exception Handling + +- **Hiding Logic Errors** - Relying solely on exception handling might mask underlying logic error in your code. It's essential to write clear and well-tested logic to minimize the need for excessive exception handling. +- **Performance Overhead** - In some cases, using `try-except` blocks can introduce a slight performance overhead compared to code without exception handling. Howerer, this is usually negligible for most applications. +- **Overuse of Exceptions** - Overusing exceptions for common errors or control flow can make code less readable and harder to maintain. It's important to use exceptions judiciously for unexpected situations. diff --git a/contrib/advanced-python/index.md b/contrib/advanced-python/index.md index 3945721..fa2fd7b 100644 --- a/contrib/advanced-python/index.md +++ b/contrib/advanced-python/index.md @@ -8,3 +8,4 @@ - [JSON module](json-module.md) - [Map Function](map-function.md) - [Protocols](protocols.md) +- [Exception Handling in Python](exception-handling.md) diff --git a/contrib/api-development/index.md b/contrib/api-development/index.md index 7278907..8d4dc59 100644 --- a/contrib/api-development/index.md +++ b/contrib/api-development/index.md @@ -1,4 +1,4 @@ # List of sections - [API Methods](api-methods.md) -- [FastAPI](fast-api.md) \ No newline at end of file +- [FastAPI](fast-api.md) diff --git a/contrib/ds-algorithms/index.md b/contrib/ds-algorithms/index.md index 31cff39..1d7293b 100644 --- a/contrib/ds-algorithms/index.md +++ b/contrib/ds-algorithms/index.md @@ -10,3 +10,6 @@ - [Greedy Algorithms](greedy-algorithms.md) - [Dynamic Programming](dynamic-programming.md) - [Linked list](linked-list.md) +- [Stacks in Python](stacks.md) +- [Sliding Window Technique](sliding-window.md) +- [Trie](trie.md) diff --git a/contrib/ds-algorithms/sliding-window.md b/contrib/ds-algorithms/sliding-window.md new file mode 100644 index 0000000..72aa191 --- /dev/null +++ b/contrib/ds-algorithms/sliding-window.md @@ -0,0 +1,249 @@ +# Sliding Window Technique + +The sliding window technique is a fundamental approach used to solve problems involving arrays, lists, or sequences. It's particularly useful when you need to calculate something over a subarray or sublist of fixed size that slides over the entire array. + +In easy words, It is the transformation of the nested loops into the single loop +## Concept + +The sliding window technique involves creating a window (a subarray or sublist) that moves or "slides" across the entire array. This window can either be fixed in size or dynamically resized. By maintaining and updating this window as it moves, you can optimize certain computations, reducing time complexity. + +## Types of Sliding Windows + +1. **Fixed Size Window**: The window size remains constant as it slides from the start to the end of the array. +2. **Variable Size Window**: The window size can change based on certain conditions, such as the sum of elements within the window meeting a specified target. + +## Steps to Implement a Sliding Window + +1. **Initialize the Window**: Set the initial position of the window and any required variables (like sum, count, etc.). +2. **Expand the Window**: Add the next element to the window and update the relevant variables. +3. **Shrink the Window**: If needed, remove elements from the start of the window and update the variables. +4. **Slide the Window**: Move the window one position to the right by including the next element and possibly excluding the first element. +5. **Repeat**: Continue expanding, shrinking, and sliding the window until you reach the end of the array. + +## Example Problems + +### 1. Maximum Sum Subarray of Fixed Size K + +Given an array of integers and an integer k, find the maximum sum of a subarray of size k. + +**Steps:** + +1. Initialize the sum of the first k elements. +2. Slide the window from the start of the array to the end, updating the sum by subtracting the element that is left behind and adding the new element. +3. Track the maximum sum encountered. + +**Python Code:** + +```python +def max_sum_subarray(arr, k): + n = len(arr) + if n < k: + return None + + # Compute the sum of the first window + window_sum = sum(arr[:k]) + max_sum = window_sum + + # Slide the window from start to end + for i in range(n - k): + window_sum = window_sum - arr[i] + arr[i + k] + max_sum = max(max_sum, window_sum) + + return max_sum + +# Example usage: +arr = [1, 3, 2, 5, 1, 1, 6, 2, 8, 5] +k = 3 +print(max_sum_subarray(arr, k)) # Output: 16 +``` + +### 2. Longest Substring Without Repeating Characters + +Given a string, find the length of the longest substring without repeating characters. + +**Steps:** + +1. Use two pointers to represent the current window. +2. Use a set to track characters in the current window. +3. Expand the window by moving the right pointer. +4. If a duplicate character is found, shrink the window by moving the left pointer until the duplicate is removed. + +**Python Code:** + +```python +def longest_unique_substring(s): + n = len(s) + char_set = set() + left = 0 + max_length = 0 + + for right in range(n): + while s[right] in char_set: + char_set.remove(s[left]) + left += 1 + char_set.add(s[right]) + max_length = max(max_length, right - left + 1) + + return max_length + +# Example usage: +s = "abcabcbb" +print(longest_unique_substring(s)) # Output: 3 +``` +## 3. Minimum Size Subarray Sum + +Given an array of positive integers and a positive integer `s`, find the minimal length of a contiguous subarray of which the sum is at least `s`. If there isn't one, return 0 instead. + +### Steps: +1. Use two pointers, `left` and `right`, to define the current window. +2. Expand the window by moving `right` and adding `arr[right]` to `current_sum`. +3. If `current_sum` is greater than or equal to `s`, update `min_length` and shrink the window from the left by moving `left` and subtracting `arr[left]` from `current_sum`. +4. Repeat until `right` has traversed the array. + +### Python Code: +```python +def min_subarray_len(s, arr): + n = len(arr) + left = 0 + current_sum = 0 + min_length = float('inf') + + for right in range(n): + current_sum += arr[right] + + while current_sum >= s: + min_length = min(min_length, right - left + 1) + current_sum -= arr[left] + left += 1 + + return min_length if min_length != float('inf') else 0 + +# Example usage: +arr = [2, 3, 1, 2, 4, 3] +s = 7 +print(min_subarray_len(s, arr)) # Output: 2 (subarray [4, 3]) +``` + +## 4. Longest Substring with At Most K Distinct Characters + +Given a string `s` and an integer `k`, find the length of the longest substring that contains at most `k` distinct characters. + +### Steps: +1. Use two pointers, `left` and `right`, to define the current window. +2. Use a dictionary `char_count` to count characters in the window. +3. Expand the window by moving `right` and updating `char_count`. +4. If `char_count` has more than `k` distinct characters, shrink the window from the left by moving `left` and updating `char_count`. +5. Keep track of the maximum length of the window with at most `k` distinct characters. + +### Python Code: +```python +def longest_substring_k_distinct(s, k): + n = len(s) + char_count = {} + left = 0 + max_length = 0 + + for right in range(n): + char_count[s[right]] = char_count.get(s[right], 0) + 1 + + while len(char_count) > k: + char_count[s[left]] -= 1 + if char_count[s[left]] == 0: + del char_count[s[left]] + left += 1 + + max_length = max(max_length, right - left + 1) + + return max_length + +# Example usage: +s = "eceba" +k = 2 +print(longest_substring_k_distinct(s, k)) # Output: 3 (substring "ece") +``` + +## 5. Maximum Number of Vowels in a Substring of Given Length + +Given a string `s` and an integer `k`, return the maximum number of vowel letters in any substring of `s` with length `k`. + +### Steps: +1. Use a sliding window of size `k`. +2. Keep track of the number of vowels in the current window. +3. Expand the window by adding the next character and update the count if it's a vowel. +4. If the window size exceeds `k`, remove the leftmost character and update the count if it's a vowel. +5. Track the maximum number of vowels found in any window of size `k`. + +### Python Code: +```python +def max_vowels(s, k): + vowels = set('aeiou') + max_vowel_count = 0 + current_vowel_count = 0 + + for i in range(len(s)): + if s[i] in vowels: + current_vowel_count += 1 + if i >= k: + if s[i - k] in vowels: + current_vowel_count -= 1 + max_vowel_count = max(max_vowel_count, current_vowel_count) + + return max_vowel_count + +# Example usage: +s = "abciiidef" +k = 3 +print(max_vowels(s, k)) # Output: 3 (substring "iii") +``` + +## 6. Subarray Product Less Than K + +Given an array of positive integers `nums` and an integer `k`, return the number of contiguous subarrays where the product of all the elements in the subarray is less than `k`. + +### Steps: +1. Use two pointers, `left` and `right`, to define the current window. +2. Expand the window by moving `right` and multiplying `product` by `nums[right]`. +3. If `product` is greater than or equal to `k`, shrink the window from the left by moving `left` and dividing `product` by `nums[left]`. +4. For each position of `right`, the number of valid subarray ending at `right` is `right - left + 1`. +5. Sum these counts to get the total number of subarray with product less than `k`. + +### Python Code: +```python +def num_subarray_product_less_than_k(nums, k): + if k <= 1: + return 0 + + product = 1 + left = 0 + count = 0 + + for right in range(len(nums)): + product *= nums[right] + + while product >= k: + product /= nums[left] + left += 1 + + count += right - left + 1 + + return count + +# Example usage: +nums = [10, 5, 2, 6] +k = 100 +print(num_subarray_product_less_than_k(nums, k)) # Output: 8 +``` + +## Advantages + +- **Efficiency**: Reduces the time complexity from O(n^2) to O(n) for many problems. +- **Simplicity**: Provides a straightforward way to manage subarrays/substrings with overlapping elements. + +## Applications + +- Finding the maximum or minimum sum of subarrays of fixed size. +- Detecting unique elements in a sequence. +- Solving problems related to dynamic programming with fixed constraints. +- Efficiently managing and processing streaming data or real-time analytics. + +By using the sliding window technique, you can tackle a wide range of problems in a more efficient manner. diff --git a/contrib/ds-algorithms/stacks.md b/contrib/ds-algorithms/stacks.md new file mode 100644 index 0000000..428a193 --- /dev/null +++ b/contrib/ds-algorithms/stacks.md @@ -0,0 +1,116 @@ +# Stacks in Python + +In Data Structures and Algorithms, a stack is a linear data structure that complies with the Last In, First Out (LIFO) rule. It works by use of two fundamental techniques: **PUSH** which inserts an element on top of the stack and **POP** which takes out the topmost element.This concept is similar to a stack of plates in a cafeteria. Stacks are usually used for handling function calls, expression evaluation, and parsing in programming. Indeed, they are efficient in managing memory as well as tracking program state. + +## Points to be Remebered + +- A stack is a collection of data items that can be accessed at only one end, called **TOP**. +- Items can be inserted and deleted in a stack only at the TOP. +- The last item inserted in a stack is the first one to be deleted. +- Therefore, a stack is called a **Last-In-First-Out (LIFO)** data structure. + +## Real Life Examples of Stacks + +- **PILE OF BOOKS** - Suppose a set of books are placed one over the other in a pile. When you remove books from the pile, the topmost book will be removed first. Similarly, when you have to add a book to the pile, the book will be placed at the top of the file. + +- **PILE OF PLATES** - The first plate begins the pile. The second plate is placed on the top of the first plate and the third plate is placed on the top of the second plate, and so on. In general, if you want to add a plate to the pile, you can keep it on the top of the pile. Similarly, if you want to remove a plate, you can remove the plate from the top of the pile. + +- **BANGLES IN A HAND** - When a person wears bangles, the last bangle worn is the first one to be removed. + +## Applications of Stacks + +Stacks are widely used in Computer Science: + +- Function call management +- Maintaining the UNDO list for the application +- Web browser *history management* +- Evaluating expressions +- Checking the nesting of parentheses in an expression +- Backtracking algorithms (Recursion) + +Understanding these applications is essential for Software Development. + +## Operations on a Stack + +Key operations on a stack include: + +- **PUSH** - It is the process of inserting a new element on the top of a stack. +- **OVERFLOW** - A situation when we are pushing an item in a stack that is full. +- **POP** - It is the process of deleting an element from the top of a stack. +- **UNDERFLOW** - A situation when we are popping item from an empty stack. +- **PEEK** - It is the process of getting the most recent value of stack *(i.e. the value at the top of the stack)* +- **isEMPTY** - It is the function which return true if stack is empty else false. +- **SHOW** -Displaying stack items. + +## Implementing Stacks in Python + +```python +def isEmpty(S): + if len(S) == 0: + return True + else: + return False + +def Push(S, item): + S.append(item) + +def Pop(S): + if isEmpty(S): + return "Underflow" + else: + val = S.pop() + return val + +def Peek(S): + if isEmpty(S): + return "Underflow" + else: + top = len(S) - 1 + return S[top] + +def Show(S): + if isEmpty(S): + print("Sorry, No items in Stack") + else: + print("(Top)", end=' ') + t = len(S) - 1 + while t >= 0: + print(S[t], "<", end=' ') + t -= 1 + print() + +stack = [] # initially stack is empty + +Push(stack, 5) +Push(stack, 10) +Push(stack, 15) + +print("Stack after Push operations:") +Show(stack) +print("Peek operation:", Peek(stack)) +print("Pop operation:", Pop(stack)) +print("Stack after Pop operation:") +Show(stack) +``` + +## Output + +```markdown +Stack after Push operations: + +(Top) 15 < 10 < 5 < + +Peek operation: 15 + +Pop operation: 15 + +Stack after Pop operation: + +(Top) 10 < 5 < +``` + +## Complexity Analysis + +- **Worst case**: `O(n)` This occurs when the stack is full, it is dominated by the usage of Show operation. +- **Best case**: `O(1)` When the operations like isEmpty, Push, Pop and Peek are used, they have a constant time complexity of O(1). +- **Average case**: `O(n)` The average complexity is likely to be lower than O(n), as the stack is not always full. diff --git a/contrib/ds-algorithms/trie.md b/contrib/ds-algorithms/trie.md new file mode 100644 index 0000000..0ccfbaa --- /dev/null +++ b/contrib/ds-algorithms/trie.md @@ -0,0 +1,152 @@ +# Trie + +A Trie is a tree-like data structure used for storing a dynamic set of strings where the keys are usually strings. It is also known as prefix tree or digital tree. + +>Trie is a type of search tree, where each node represents a single character of a string. + +>Nodes are linked in such a way that they form a tree, where each path from the root to a leaf node represents a unique string stored in the Trie. + +## Characteristics of Trie +- **Prefix Matching**: Tries are particularly useful for prefix matching operations. Any node in the Trie represents a common prefix of all strings below it. +- **Space Efficiency**: Tries can be more space-efficient than other data structures like hash tables for storing large sets of strings with common prefixes. +- **Time Complexity**: Insertion, deletion, and search operations in a Trie have a time complexity of +𝑂(𝑚), where m is the length of the string. This makes Tries very efficient for these operations. + +## Structure of Trie + +Trie mainly consists of three parts: +- **Root**: The root of a Trie is an empty node that does not contain any character. +- **Edges**: Each edge in the Trie represents a character in the alphabet of the stored strings. +- **Nodes**: Each node contains a character and possibly additional information, such as a boolean flag indicating if the node represents the end of a valid string. + +To implement the nodes of trie. We use Classes in Python. Each node is an object of the Node Class. + +Node Class have mainly two components +- *Array of size 26*: It is used to represent the 26 alphabets. Initially all are None. While inserting the words, then array will be filled with object of child nodes. +- *End of word*: It is used to represent the end of word while inserting. + +Code Block of Node Class : + +```python +class Node: + def __init__(self): + self.alphabets = [None] * 26 + self.end_of_word = 0 +``` + +Now we need to implement Trie. We create another class named Trie with some methods like Insertion, Searching and Deletion. + +**Initialization:** In this, we initializes the Trie with a `root` node. + +Code Implementation of Initialization: + +```python +class Trie: + def __init__(self): + self.root = Node() +``` + +## Operations on Trie + +1. **Insertion**: Inserts the word into the Trie. This method takes `word` as parameter. For each character in the word, it checks if there is a corresponding child node. If not, it creates a new `Node`. After processing all the characters in word, it increments the `end_of_word` value of the last node. + +Code Implementation of Insertion: +```python +def insert(self, word): + node = self.root + for char in word: + index = ord(char) - ord('a') + if not node.alphabets[index]: + node.alphabets[index] = Node() + node = node.alphabets[index] + node.end_of_word += 1 +``` + +2. **Searching**: Search the `word` in trie. Searching process starts from the `root` node. Each character of the `word` is processed. After traversing the whole word in trie, it return the count of words. + +There are two cases in Searching: +- *Word Not found*: It happens when the word we search not present in the trie. This case will occur, if the value of `alphabets` array at that character is `None` or if the value of `end_of_word` of the node, reached after traversing the whole word is `0`. +- *Word found*: It happens when the search word is present in the Trie. This case will occur, when the `end_of_word` value is greater than `0` of the node after traversing the whole word. + +Code Implementation of Searching: +```python + def Search(self, word): + node = self.root + for char in word: + index = ord(char) - ord('a') + if not node.alphabets[index]: + return 0 + node = node.alphabets[index] + return node.end_of_word +``` + +3. **Deletion**: To delete a string, follow the path of the string. If the end node is reached and `end_of_word` is greater than `0` then decrement the value. + +Code Implementation of Deletion: + +```python +def delete(self, word): + node = self.root + for char in word: + index = ord(char) - ord('a') + node = node.alphabets[index] + if node.end_of_word: + node.end_of_word-=1 +``` + +Python Code to implement Trie: + +```python +class Node: + def __init__(self): + self.alphabets = [None] * 26 + self.end_of_word = 0 + +class Trie: + def __init__(self): + self.root = Node() + + def insert(self, word): + node = self.root + for char in word: + index = ord(char) - ord('a') + if not node.alphabets[index]: + node.alphabets[index] = Node() + node = node.alphabets[index] + node.end_of_word += 1 + + def Search(self, word): + node = self.root + for char in word: + index = ord(char) - ord('a') + if not node.alphabets[index]: + return 0 + node = node.alphabets[index] + return node.end_of_word + + def delete(self, word): + node = self.root + for char in word: + index = ord(char) - ord('a') + node = node.alphabets[index] + if node.end_of_word: + node.end_of_word-=1 + +if __name__ == "__main__": + trie = Trie() + + word1 = "apple" + word2 = "app" + word3 = "bat" + + trie.insert(word1) + trie.insert(word2) + trie.insert(word3) + + print(trie.Search(word1)) + print(trie.Search(word2)) + print(trie.Search(word3)) + + trie.delete(word2) + print(trie.Search(word2)) +``` diff --git a/contrib/machine-learning/ArtificialNeuralNetwork.md b/contrib/machine-learning/ann.md similarity index 100% rename from contrib/machine-learning/ArtificialNeuralNetwork.md rename to contrib/machine-learning/ann.md diff --git a/contrib/machine-learning/assets/cnn-dropout.png b/contrib/machine-learning/assets/cnn-dropout.png new file mode 100644 index 0000000..9cb18f9 Binary files /dev/null and b/contrib/machine-learning/assets/cnn-dropout.png differ diff --git a/contrib/machine-learning/assets/cnn-filters.png b/contrib/machine-learning/assets/cnn-filters.png new file mode 100644 index 0000000..463ca60 Binary files /dev/null and b/contrib/machine-learning/assets/cnn-filters.png differ diff --git a/contrib/machine-learning/assets/cnn-flattened.png b/contrib/machine-learning/assets/cnn-flattened.png new file mode 100644 index 0000000..2d1ca6f Binary files /dev/null and b/contrib/machine-learning/assets/cnn-flattened.png differ diff --git a/contrib/machine-learning/assets/cnn-input_shape.png b/contrib/machine-learning/assets/cnn-input_shape.png new file mode 100644 index 0000000..34379f1 Binary files /dev/null and b/contrib/machine-learning/assets/cnn-input_shape.png differ diff --git a/contrib/machine-learning/assets/cnn-ouputs.png b/contrib/machine-learning/assets/cnn-ouputs.png new file mode 100644 index 0000000..2797226 Binary files /dev/null and b/contrib/machine-learning/assets/cnn-ouputs.png differ diff --git a/contrib/machine-learning/assets/cnn-padding.png b/contrib/machine-learning/assets/cnn-padding.png new file mode 100644 index 0000000..a441b2b Binary files /dev/null and b/contrib/machine-learning/assets/cnn-padding.png differ diff --git a/contrib/machine-learning/assets/cnn-pooling.png b/contrib/machine-learning/assets/cnn-pooling.png new file mode 100644 index 0000000..c3ada5c Binary files /dev/null and b/contrib/machine-learning/assets/cnn-pooling.png differ diff --git a/contrib/machine-learning/assets/cnn-strides.png b/contrib/machine-learning/assets/cnn-strides.png new file mode 100644 index 0000000..26339a9 Binary files /dev/null and b/contrib/machine-learning/assets/cnn-strides.png differ diff --git a/contrib/machine-learning/binomial_distribution.md b/contrib/machine-learning/binomial-distribution.md similarity index 100% rename from contrib/machine-learning/binomial_distribution.md rename to contrib/machine-learning/binomial-distribution.md diff --git a/contrib/machine-learning/clustering.md b/contrib/machine-learning/clustering.md new file mode 100644 index 0000000..bc02d37 --- /dev/null +++ b/contrib/machine-learning/clustering.md @@ -0,0 +1,96 @@ +# Clustering + +Clustering is an unsupervised machine learning technique that groups a set of objects in such a way that objects in the same group (called a cluster) are more similar to each other than to those in other groups (clusters). This README provides an overview of clustering, including its fundamental concepts, types, algorithms, and how to implement it using Python. + +## Introduction + +Clustering is a technique used to find inherent groupings within data without pre-labeled targets. It is widely used in exploratory data analysis, pattern recognition, image analysis, information retrieval, and bioinformatics. + +## Concepts + +### Centroid + +A centroid is the center of a cluster. In the k-means clustering algorithm, for example, each cluster is represented by its centroid, which is the mean of all the data points in the cluster. + +### Distance Measure + +Distance measures are used to quantify the similarity or dissimilarity between data points. Common distance measures include Euclidean distance, Manhattan distance, and cosine similarity. + +### Inertia + +Inertia is a metric used to assess the quality of the clusters formed. It is the sum of squared distances of samples to their nearest cluster center. + +## Types of Clustering + +1. **Hard Clustering**: Each data point either belongs to a cluster completely or not at all. +2. **Soft Clustering (Fuzzy Clustering)**: Each data point can belong to multiple clusters with varying degrees of membership. + +## Clustering Algorithms + +### K-Means Clustering + +K-Means is a popular clustering algorithm that partitions the data into k clusters, where each data point belongs to the cluster with the nearest mean. The algorithm follows these steps: +1. Initialize k centroids randomly. +2. Assign each data point to the nearest centroid. +3. Recalculate the centroids as the mean of all data points assigned to each cluster. +4. Repeat steps 2 and 3 until convergence. + +### Hierarchical Clustering + +Hierarchical clustering builds a tree of clusters. There are two types: +- **Agglomerative (bottom-up)**: Starts with each data point as a separate cluster and merges the closest pairs of clusters iteratively. +- **Divisive (top-down)**: Starts with all data points in one cluster and splits the cluster iteratively into smaller clusters. + +### DBSCAN (Density-Based Spatial Clustering of Applications with Noise) + +DBSCAN groups together points that are close to each other based on a distance measurement and a minimum number of points. It can find arbitrarily shaped clusters and is robust to noise. + +## Implementation + +### Using Scikit-learn + +Scikit-learn is a popular machine learning library in Python that provides tools for clustering. + +### Code Example + +```python +import numpy as np +import pandas as pd +from sklearn.cluster import KMeans +from sklearn.preprocessing import StandardScaler +from sklearn.metrics import silhouette_score + +# Load dataset +data = pd.read_csv('path/to/your/dataset.csv') + +# Preprocess the data +scaler = StandardScaler() +data_scaled = scaler.fit_transform(data) + +# Initialize and fit KMeans model +kmeans = KMeans(n_clusters=3, random_state=42) +kmeans.fit(data_scaled) + +# Get cluster labels +labels = kmeans.labels_ + +# Calculate silhouette score +silhouette_avg = silhouette_score(data_scaled, labels) +print("Silhouette Score:", silhouette_avg) + +# Add cluster labels to the original data +data['Cluster'] = labels + +print(data.head()) +``` + +## Evaluation Metrics + +- **Silhouette Score**: Measures how similar a data point is to its own cluster compared to other clusters. +- **Inertia (Within-cluster Sum of Squares)**: Measures the compactness of the clusters. +- **Davies-Bouldin Index**: Measures the average similarity ratio of each cluster with the cluster that is most similar to it. +- **Dunn Index**: Ratio of the minimum inter-cluster distance to the maximum intra-cluster distance. + +## Conclusion + +Clustering is a powerful technique for discovering structure in data. Understanding different clustering algorithms and their evaluation metrics is crucial for selecting the appropriate method for a given problem. diff --git a/contrib/machine-learning/cost-functions.md b/contrib/machine-learning/cost-functions.md new file mode 100644 index 0000000..c1fe217 --- /dev/null +++ b/contrib/machine-learning/cost-functions.md @@ -0,0 +1,235 @@ + +# Cost Functions in Machine Learning + +Cost functions, also known as loss functions, play a crucial role in training machine learning models. They measure how well the model performs on the training data by quantifying the difference between predicted and actual values. Different types of cost functions are used depending on the problem domain and the nature of the data. + +## Types of Cost Functions + +### 1. Mean Squared Error (MSE) + +**Explanation:** +MSE is one of the most commonly used cost functions, particularly in regression problems. It calculates the average squared difference between the predicted and actual values. + +**Mathematical Formulation:** +The MSE is defined as: +$$MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$ +Where: +- `n` is the number of samples. +- $y_i$ is the actual value. +- $\hat{y}_i$ is the predicted value. + +**Advantages:** +- Sensitive to large errors due to squaring. +- Differentiable and convex, facilitating optimization. + +**Disadvantages:** +- Sensitive to outliers, as the squared term amplifies their impact. + +**Python Implementation:** +```python +import numpy as np + +def mean_squared_error(y_true, y_pred): + n = len(y_true) + return np.mean((y_true - y_pred) ** 2) +``` + +### 2. Mean Absolute Error (MAE) + +**Explanation:** +MAE is another commonly used cost function for regression tasks. It measures the average absolute difference between predicted and actual values. + +**Mathematical Formulation:** +The MAE is defined as: +$$MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|$$ +Where: +- `n` is the number of samples. +- $y_i$ is the actual value. +- $\hat{y}_i$ is the predicted value. + +**Advantages:** +- Less sensitive to outliers compared to MSE. +- Provides a linear error term, which can be easier to interpret. + + +**Disadvantages:** +- Not differentiable at zero, which can complicate optimization. + +**Python Implementation:** +```python +import numpy as np + +def mean_absolute_error(y_true, y_pred): + n = len(y_true) + return np.mean(np.abs(y_true - y_pred)) +``` + +### 3. Cross-Entropy Loss (Binary) + +**Explanation:** +Cross-entropy loss is commonly used in binary classification problems. It measures the dissimilarity between the true and predicted probability distributions. + +**Mathematical Formulation:** + +For binary classification, the cross-entropy loss is defined as: + +$$\text{Cross-Entropy} = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)]$$ + +Where: +- `n` is the number of samples. +- $y_i$ is the actual class label (0 or 1). +- $\hat{y}_i$ is the predicted probability of the positive class. + + +**Advantages:** +- Penalizes confident wrong predictions heavily. +- Suitable for probabilistic outputs. + +**Disadvantages:** +- Sensitive to class imbalance. + +**Python Implementation:** +```python +import numpy as np + +def binary_cross_entropy(y_true, y_pred): + n = len(y_true) + return -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred)) +``` + +### 4. Cross-Entropy Loss (Multiclass) + +**Explanation:** +For multiclass classification problems, the cross-entropy loss is adapted to handle multiple classes. + +**Mathematical Formulation:** + +The multiclass cross-entropy loss is defined as: + +$$\text{Cross-Entropy} = -\frac{1}{n} \sum_{i=1}^{n} \sum_{c=1}^{C} y_{i,c} \log(\hat{y}_{i,c})$$ + +Where: +- `n` is the number of samples. +- `C` is the number of classes. +- $y_{i,c}$ is the indicator function for the true class of sample `i`. +- $\hat{y}_{i,c}$ is the predicted probability of sample `i` belonging to class `c`. + +**Advantages:** +- Handles multiple classes effectively. +- Encourages the model to assign high probabilities to the correct classes. + +**Disadvantages:** +- Requires one-hot encoding for class labels, which can increase computational complexity. + +**Python Implementation:** +```python +import numpy as np + +def categorical_cross_entropy(y_true, y_pred): + n = len(y_true) + return -np.mean(np.sum(y_true * np.log(y_pred), axis=1)) +``` + +### 5. Hinge Loss (SVM) + +**Explanation:** +Hinge loss is commonly used in support vector machines (SVMs) for binary classification tasks. It penalizes misclassifications by a linear margin. + +**Mathematical Formulation:** + +For binary classification, the hinge loss is defined as: + +$$\text{Hinge Loss} = \frac{1}{n} \sum_{i=1}^{n} \max(0, 1 - y_i \cdot \hat{y}_i)$$ + +Where: +- `n` is the number of samples. +- $y_i$ is the actual class label (-1 or 1). +- $\hat{y}_i$ is the predicted score for sample \( i \). + +**Advantages:** +- Encourages margin maximization in SVMs. +- Robust to outliers due to the linear penalty. + +**Disadvantages:** +- Not differentiable at the margin, which can complicate optimization. + +**Python Implementation:** +```python +import numpy as np + +def hinge_loss(y_true, y_pred): + n = len(y_true) + loss = np.maximum(0, 1 - y_true * y_pred) + return np.mean(loss) +``` + +### 6. Huber Loss + +**Explanation:** +Huber loss is a combination of MSE and MAE, providing a compromise between the two. It is less sensitive to outliers than MSE and provides a smooth transition to MAE for large errors. + +**Mathematical Formulation:** + +The Huber loss is defined as: + + +$$\text{Huber Loss} = \frac{1}{n} \sum_{i=1}^{n} \left\{ +\begin{array}{ll} +\frac{1}{2} (y_i - \hat{y}_i)^2 & \text{if } |y_i - \hat{y}_i| \leq \delta \\ +\delta(|y_i - \hat{y}_i| - \frac{1}{2} \delta) & \text{otherwise} +\end{array} +\right.$$ + +Where: +- `n` is the number of samples. +- $\delta$ is a threshold parameter. + +**Advantages:** +- Provides a smooth loss function. +- Less sensitive to outliers than MSE. + +**Disadvantages:** +- Requires tuning of the threshold parameter. + +**Python Implementation:** +```python +import numpy as np + +def huber_loss(y_true, y_pred, delta): + error = y_true - y_pred + loss = np.where(np.abs(error) <= delta, 0.5 * error ** 2, delta * (np.abs(error) - 0.5 * delta)) + return np.mean(loss) +``` + +### 7. Log-Cosh Loss + +**Explanation:** +Log-Cosh loss is a smooth approximation of the MAE and is less sensitive to outliers than MSE. It provides a smooth transition from quadratic for small errors to linear for large errors. + +**Mathematical Formulation:** + +The Log-Cosh loss is defined as: + +$$\text{Log-Cosh Loss} = \frac{1}{n} \sum_{i=1}^{n} \log(\cosh(y_i - \hat{y}_i))$$ + +Where: +- `n` is the number of samples. + +**Advantages:** +- Smooth and differentiable everywhere. +- Less sensitive to outliers. + +**Disadvantages:** +- Computationally more expensive than simple losses like MSE. + +**Python Implementation:** +```python +import numpy as np + +def logcosh_loss(y_true, y_pred): + error = y_true - y_pred + loss = np.log(np.cosh(error)) + return np.mean(loss) +``` + +These implementations provide various options for cost functions suitable for different machine learning tasks. Each function has its advantages and disadvantages, making them suitable for different scenarios and problem domains. diff --git a/contrib/machine-learning/Decision-Tree.md b/contrib/machine-learning/decision-tree.md similarity index 99% rename from contrib/machine-learning/Decision-Tree.md rename to contrib/machine-learning/decision-tree.md index 6563a22..8159bcf 100644 --- a/contrib/machine-learning/Decision-Tree.md +++ b/contrib/machine-learning/decision-tree.md @@ -254,4 +254,4 @@ The final decision tree classifies instances based on the following rules: - If Outlook is Rain and Wind is Weak, PlayTennis is Yes - If Outlook is Rain and Wind is Strong, PlayTennis is No -> Note that the calculated entropies and information gains may vary slightly depending on the specific implementation and rounding methods used. \ No newline at end of file +> Note that the calculated entropies and information gains may vary slightly depending on the specific implementation and rounding methods used. diff --git a/contrib/machine-learning/index.md b/contrib/machine-learning/index.md index e3a8f0b..b6945cd 100644 --- a/contrib/machine-learning/index.md +++ b/contrib/machine-learning/index.md @@ -1,13 +1,18 @@ # List of sections -- [Binomial Distribution](binomial_distribution.md) -- [Regression in Machine Learning](Regression.md) +- [Introduction to scikit-learn](sklearn-introduction.md) +- [Binomial Distribution](binomial-distribution.md) +- [Regression in Machine Learning](regression.md) - [Confusion Matrix](confusion-matrix.md) -- [Decision Tree Learning](Decision-Tree.md) +- [Decision Tree Learning](decision-tree.md) +- [Random Forest](random-forest.md) - [Support Vector Machine Algorithm](support-vector-machine.md) -- [Artificial Neural Network from the Ground Up](ArtificialNeuralNetwork.md) -- [TensorFlow.md](tensorFlow.md) +- [Artificial Neural Network from the Ground Up](ann.md) +- [Introduction To Convolutional Neural Networks (CNNs)](intro-to-cnn.md) +- [TensorFlow.md](tensorflow.md) - [PyTorch.md](pytorch.md) -- [Types of optimizers](Types_of_optimizers.md) +- [Types of optimizers](types-of-optimizers.md) - [Logistic Regression](logistic-regression.md) +- [Types_of_Cost_Functions](cost-functions.md) +- [Clustering](clustering.md) - [Grid Search](grid-search.md) diff --git a/contrib/machine-learning/intro-to-cnn.md b/contrib/machine-learning/intro-to-cnn.md new file mode 100644 index 0000000..0221ca1 --- /dev/null +++ b/contrib/machine-learning/intro-to-cnn.md @@ -0,0 +1,225 @@ +# Understanding Convolutional Neural Networks (CNN) + +## Introduction +Convolutional Neural Networks (CNNs) are a specialized type of artificial neural network designed primarily for processing structured grid data like images. CNNs are particularly powerful for tasks involving image recognition, classification, and computer vision. They have revolutionized these fields, outperforming traditional neural networks by leveraging their unique architecture to capture spatial hierarchies in images. + +### Why CNNs are Superior to Traditional Neural Networks +1. **Localized Receptive Fields**: CNNs use convolutional layers that apply filters to local regions of the input image. This localized connectivity ensures that the network learns spatial hierarchies and patterns, such as edges and textures, which are essential for image recognition tasks. +2. **Parameter Sharing**: In CNNs, the same filter (set of weights) is used across different parts of the input, significantly reducing the number of parameters compared to fully connected layers in traditional neural networks. This not only lowers the computational cost but also mitigates the risk of overfitting. +3. **Translation Invariance**: Due to the shared weights and pooling operations, CNNs are inherently invariant to translations of the input image. This means that they can recognize objects even when they appear in different locations within the image. +4. **Hierarchical Feature Learning**: CNNs automatically learn a hierarchy of features from low-level features like edges to high-level features like shapes and objects. Traditional neural networks, on the other hand, require manual feature extraction which is less effective and more time-consuming. + +### Use Cases of CNNs +- **Image Classification**: Identifying objects within an image (e.g., classifying a picture as containing a cat or a dog). +- **Object Detection**: Detecting and locating objects within an image (e.g., finding faces in a photo). +- **Image Segmentation**: Partitioning an image into segments or regions (e.g., dividing an image into different objects and background). +- **Medical Imaging**: Analyzing medical scans like MRI, CT, and X-rays for diagnosis. + +> This guide will walk you through the fundamentals of CNNs and their implementation in Python. We'll build a simple CNN from scratch, explaining each component to help you understand how CNNs process images and extract features. + +### Let's start by understanding the basic architecture of CNNs. + +## CNN Architecture +Convolution layers, pooling layers, and fully connected layers are just a few of the many building blocks that CNNs use to automatically and adaptively learn spatial hierarchies of information through backpropagation. + +### Convolutional Layer +The convolutional layer is the core building block of a CNN. The layer's parameters consist of a set of learnable filters (or kernels), which have a small receptive field but extend through the full depth of the input volume. + +#### Input Shape +The dimensions of the input image, including the number of channels (e.g., 3 for RGB images & 1 for Grayscale images). + + +- The input matrix is a binary image of handwritten digits, +where '1' marks the pixels containing the digit (ink/grayscale area) and '0' marks the background pixels (empty space). +- The first matrix shows the represnetation of 1 and 0, which can be depicted as a vertical line and a closed loop. +- The second matrix represents 9, combining the loop and line. + +#### Strides +The step size with which the filter moves across the input image. + + +- This visualization will help you understand how the filter (kernel) moves acroos the input matrix with stride values of (3,3) and (2,2). +- A stride of 1 means the filter moves one step at a time, ensuring it covers the entire input matrix. +- However, with larger strides (like 3 or 2 in this example), the filter may not cover all elements, potentially missing some information. +- While this might seem like a drawback, higher strides are often used to reduce computational cost and decrease the output size, which can be beneficial in speeding up the training process and preventing overfitting. + +#### Padding +Determines whether the output size is the same as the input size ('same') or reduced ('valid'). + + +- `Same` padding is preferred in earlier layers to preserve spatial and edge information, as it can help the network learn more detailed features. +- Choose `valid` padding when focusing on the central input region or requiring specific output dimensions. +- Padding value can be determined by $ ( f - 1 ) \over 2 $, where f isfilter size + +#### Filters +Small matrices that slide over the input data to extract features. + + +- The first filter aims to detect closed loops within the input image, being highly relevant for recognizing digits with circular or oval shapes, such as '0', '6', '8', or '9'. +- The next filter helps in detecting vertical lines, crucial for identifying digits like '1', '4', '7', and parts of other digits that contain vertical strokes. +- The last filter shows how to detect diagonal lines in the input image, useful for identifying the slashes present in digits like '1', '7', or parts of '4' and '9'. + +#### Output +A set of feature maps that represent the presence of different features in the input. + + +- With no padding and a stride of 1, the 3x3 filter moves one step at a time across the 7x5 input matrix. The filter can only move within the original boundaries of the input, resulting in a smaller 5x3 output matrix. This configuration is useful when you want to reduce the spatial dimensions of the feature map while preserving the exact spatial relationships between features. +- By adding zero padding to the input matrix, it is expanded to 9x7, allowing the 3x3 filter to "fit" fully on the edges and corners. With a stride of 1, the filter still moves one step at a time, but now the output matrix is the same size (7x5) as the original input. Same padding is often preferred in early layers of a CNN to preserve spatial information and avoid rapid feature map shrinkage. +- Without padding, the 3x3 filter operates within the original input matrix boundaries, but now it moves two steps at a time (stride 2). This significantly reduces the output matrix size to 3x2. Larger strides are employed to decrease computational cost and the output size, which can be beneficial in speeding up the training process and preventing overfitting. However, they might miss some finer details due to the larger jumps. +- The output dimension of a CNN model is given by, $$ n_{out} = { n_{in} + (2 \cdot p) - k \over s } $$ +where, + nin = number of input features + p = padding + k = kernel size + s = stride + +- Also, the number of trainable parameters for each layer is given by, $ (n_c \cdot [k \cdot k] \cdot f) + f $ +where, + nc = number of input channels + k x k = kernel size + f = number of filters + an additional f is added for bias + +### Pooling Layer +Pooling layers reduce the dimensionality of each feature map while retaining the most critical information. The most common form of pooling is max pooling. +- **Input Shape:** The dimensions of the feature map from the convolutional layer. +- **Pooling Size:** The size of the pooling window (e.g., 2x2). +- **Strides:** The step size for the pooling operation. +- **Output:** A reduced feature map highlighting the most important features. +