diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 4a366da..8688009 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -24,8 +24,8 @@ The list of topics for which we are looking for content are provided below along - Web Scrapping - [Link](https://github.com/animator/learn-python/tree/main/contrib/web-scrapping) - API Development - [Link](https://github.com/animator/learn-python/tree/main/contrib/api-development) - Data Structures & Algorithms - [Link](https://github.com/animator/learn-python/tree/main/contrib/ds-algorithms) -- Python Mini Projects - [Link](https://github.com/animator/learn-python/tree/main/contrib/mini-projects) -- Python Question Bank - [Link](https://github.com/animator/learn-python/tree/main/contrib/question-bank) +- Python Mini Projects - [Link](https://github.com/animator/learn-python/tree/main/contrib/mini-projects) **(Not accepting)** +- Python Question Bank - [Link](https://github.com/animator/learn-python/tree/main/contrib/question-bank) **(Not accepting)** You can check out some content ideas below. diff --git a/contrib/advanced-python/exception-handling.md b/contrib/advanced-python/exception-handling.md new file mode 100644 index 0000000..3e0c672 --- /dev/null +++ b/contrib/advanced-python/exception-handling.md @@ -0,0 +1,192 @@ +# Exception Handling in Python + +Exception Handling is a way of managing the errors that may occur during a program execution. Python's exception handling mechanism has been designed to avoid the unexpected termination of the program, and offer to either regain control after an error or display a meaningful message to the user. + +- **Error** - An error is a mistake or an incorrect result produced by a program. It can be a syntax error, a logical error, or a runtime error. Errors are typically fatal, meaning they prevent the program from continuing to execute. +- **Exception** - An exception is an event that occurs during the execution of a program that disrupts the normal flow of instructions. Exceptions are typically unexpected and can be handled by the program to prevent it from crashing or terminating abnormally. It can be runtime, input/output or system exceptions. Exceptions are designed to be handled by the program, allowing it to recover from the error and continue executing. + +## Python Built-in Exceptions + +There are plenty of built-in exceptions in Python that are raised when a corresponding error occur. +We can view all the built-in exceptions using the built-in `local()` function as follows: + +```python +print(dir(locals()['__builtins__'])) +``` + +|**S.No**|**Exception**|**Description**| +|---|---|---| +|1|SyntaxError|A syntax error occurs when the code we write violates the grammatical rules such as misspelled keywords, missing colon, mismatched parentheses etc.| +|2|TypeError|A type error occurs when we try to perform an operation or use a function with objects that are of incompatible data types.| +|3|NameError|A name error occurs when we try to use a variable, function, module or string without quotes that hasn't been defined or isn't used in a valid way.| +|4|IndexError|A index error occurs when we try to access an element in a sequence (like a list, tuple or string) using an index that's outside the valid range of indices for that sequence.| +|5|KeyError|A key error occurs when we try to access a key that doesn't exist in a dictionary. Attempting to retrieve a value using a non-existent key results this error.| +|6|ValueError|A value error occurs when we provide an argument or value that's inappropriate for a specific operation or function such as doing mathematical operations with incompatible types (e.g., dividing a string by an integer.)| +|7|AttributeError|An attribute error occurs when we try to access an attribute (like a variable or method) on an object that doesn't possess that attribute.| +|8|IOError|An IO (Input/Output) error occurs when an operation involving file or device interaction fails. It signifies that there's an issue during communication between your program and the external system.| +|9|ZeroDivisionError|A ZeroDivisionError occurs when we attempt to divide a number by zero. This operation is mathematically undefined, and Python raises this error to prevent nonsensical results.| +|10|ImportError|An import error occurs when we try to use a module or library that Python can't find or import succesfully.| + +## Try and Except Statement - Catching Exception + +The `try-except` statement allows us to anticipate potential errors during program execution and define what actions to take when those errors occur. This prevents the program from crashing unexpectedly and makes it more robust. + +Here's an example to explain this: + +```python +try: + # Code that might raise an exception + result = 10 / 0 +except: + print("An error occured!") +``` + +Output + +```markdown +An error occured! +``` + +In this example, the `try` block contains the code that you suspect might raise an exception. Python attempts to execute the code within this block. If an exception occurs, Python jumps to the `except` block and executes the code within it. + +## Specific Exception Handling + +You can specify the type of expection you want to catch using the `except` keyword followed by the exception class name. You can also have multiple `except` blocks to handle different exception types. + +Here's an example: + +```python +try: + # Code that might raise ZeroDivisionError or NameError + result = 10 / 0 + name = undefined_variable +except ZeroDivisionError: + print("Oops! You tried to divide by zero.") +except NameError: + print("There's a variable named 'undefined_variable' that hasn't been defined yet.") +``` + +Output + +```markdown +Oops! You tried to divide by zero. +``` + +If you comment on the line `result = 10 / 0`, then the output will be: + +```markdown +There's a variable named 'undefined_variable' that hasn't been defined yet. +``` + +## Important Note + +In this code, the `except` block are specific to each type of expection. If you want to catch both exceptions with a single `except` block, you can use of tuple of exceptions, like this: + +```python +try: + # Code that might raise ZeroDivisionError or NameError + result = 10 / 0 + name = undefined_variable +except (ZeroDivisionError, NameError): + print("An error occured!") +``` + +Output + +```markdown +An error occured! +``` + +## Try with Else Clause + +The `else` clause in a Python `try-except` block provides a way to execute code only when the `try` block succeeds without raising any exceptions. It's like having a section of code that runs exclusively under the condition that no errors occur during the main operation in the `try` block. + +Here's an example to understand this: + +```python +def calculate_average(numbers): + if len(numbers) == 0: # Handle empty list case seperately (optional) + return None + try: + total = sum(numbers) + average = total / len(numbers) + except ZeroDivisionError: + print("Cannot calculate average for a list containing zero.") + else: + print("The average is:", average) + return average #Optionally return the average here + +# Example usage +numbers = [10, 20, 30] +result = calculate_average(numbers) + +if result is not None: # Check if result is available (handles empty list case) + print("Calculation succesfull!") +``` + +Output + +```markdown +The average is: 20.0 +``` + +## Finally Keyword in Python + +The `finally` keyword in Python is used within `try-except` statements to execute a block of code **always**, regardless of whether an exception occurs in the `try` block or not. + +To understand this, let us take an example: + +```python +try: + a = 10 // 0 + print(a) +except ZeroDivisionError: + print("Cannot be divided by zero.") +finally: + print("Program executed!") +``` + +Output + +```markdown +Cannot be divided by zero. +Program executed! +``` + +## Raise Keyword in Python + +In Python, raising an exception allows you to signal that an error condition has occured during your program's execution. The `raise` keyword is used to explicity raise an exception. + +Let us take an example: + +```python +def divide(x, y): + if y == 0: + raise ZeroDivisionError("Can't divide by zero!") # Raise an exception with a message + result = x / y + return result + +try: + division_result = divide(10, 0) + print("Result:", division_result) +except ZeroDivisionError as e: + print("An error occured:", e) # Handle the exception and print the message +``` + +Output + +```markdown +An error occured: Can't divide by zero! +``` + +## Advantages of Exception Handling + +- **Improved Error Handling** - It allows you to gracefully handle unexpected situations that arise during program execution. Instead of crashing abruptly, you can define specific actions to take when exceptions occur, providing a smoother experience. +- **Code Robustness** - Exception Handling helps you to write more resilient programs by anticipating potential issues and providing approriate responses. +- **Enhanced Code Readability** - By seperating error handling logic from the core program flow, your code becomes more readable and easier to understand. The `try-except` blocks clearly indicate where potential errors might occur and how they'll be addressed. + +## Disadvantages of Exception Handling + +- **Hiding Logic Errors** - Relying solely on exception handling might mask underlying logic error in your code. It's essential to write clear and well-tested logic to minimize the need for excessive exception handling. +- **Performance Overhead** - In some cases, using `try-except` blocks can introduce a slight performance overhead compared to code without exception handling. Howerer, this is usually negligible for most applications. +- **Overuse of Exceptions** - Overusing exceptions for common errors or control flow can make code less readable and harder to maintain. It's important to use exceptions judiciously for unexpected situations. diff --git a/contrib/advanced-python/index.md b/contrib/advanced-python/index.md index b95e4b9..febcbbe 100644 --- a/contrib/advanced-python/index.md +++ b/contrib/advanced-python/index.md @@ -7,3 +7,4 @@ - [Regular Expressions in Python](regular_expressions.md) - [JSON module](json-module.md) - [Map Function](map-function.md) +- [Exception Handling in Python](exception-handling.md) diff --git a/contrib/database/index.md b/contrib/database/index.md index 56cd85b..bc3d7e6 100644 --- a/contrib/database/index.md +++ b/contrib/database/index.md @@ -1,3 +1,4 @@ # List of sections - [Introduction to MySQL and Queries](intro_mysql_queries.md) +- [SQLAlchemy and Aggregation Functions](sqlalchemy-aggregation.md) diff --git a/contrib/database/sqlalchemy-aggregation.md b/contrib/database/sqlalchemy-aggregation.md new file mode 100644 index 0000000..9fce96c --- /dev/null +++ b/contrib/database/sqlalchemy-aggregation.md @@ -0,0 +1,123 @@ +# SQLAlchemy +SQLAlchemy is a powerful and flexible SQL toolkit and Object-Relational Mapping (ORM) library for Python. It is a versatile library that bridges the gap between Python applications and relational databases. + +SQLAlchemy allows the user to write database-agnostic code that can work with a variety of relational databases such as SQLite, MySQL, PostgreSQL, Oracle, and Microsoft SQL Server. The ORM layer in SQLAlchemy allows developers to map Python classes to database tables. This means you can interact with your database using Python objects instead of writing raw SQL queries. + +## Setting up the Environment +* Python and MySQL Server must be installed and configured. +* The library: **mysql-connector-python** and **sqlalchemy** must be installed. + +```bash +pip install sqlalchemy mysql-connector-python +``` + +* If not installed, you can install them using the above command in terminal, + +## Establishing Connection with Database + +* Create a connection with the database using the following code snippet: +```python +from sqlalchemy import create_engine +from sqlalchemy.orm import declarative_base +from sqlalchemy.orm import sessionmaker + +DATABASE_URL = 'mysql+mysqlconnector://root:12345@localhost/gssoc' + +engine = create_engine(DATABASE_URL) +Session = sessionmaker(bind=engine) +session = Session() + +Base = declarative_base() +``` + +* The connection string **DATABASE_URL** is passed as an argument to **create_engine** function which is used to create a connection to the database. This connection string contains the database credentials such as the database type, username, password, and database name. +* The **sessionmaker** function is used to create a session object which is used to interact with the database +* The **declarative_base** function is used to create a base class for all the database models. This base class is used to define the structure of the database tables. + +## Creating Tables + +* The following code snippet creates a table named **"products"** in the database: +```python +from sqlalchemy import Column, Integer, String, Float + +class Product(Base): + __tablename__ = 'products' + id = Column(Integer, primary_key=True) + name = Column(String(50)) + category = Column(String(50)) + price = Column(Float) + quantity = Column(Integer) + +Base.metadata.create_all(engine) +``` + +* The **Product class** inherits from **Base**, which is a base class for all the database models. +* The **Base.metadata.create_all(engine)** statement is used to create the table in the database. The engine object is a connection to the database that was created earlier. + +## Inserting Data for Aggregation Functions + +* The following code snippet inserts data into the **"products"** table: +```python +products = [ + Product(name='Laptop', category='Electronics', price=1000, quantity=50), + Product(name='Smartphone', category='Electronics', price=700, quantity=150), + Product(name='Tablet', category='Electronics', price=400, quantity=100), + Product(name='Headphones', category='Accessories', price=100, quantity=200), + Product(name='Charger', category='Accessories', price=20, quantity=300), +] + +session.add_all(products) +session.commit() +``` + +* A list of **Product** objects is created. Each Product object represents a row in the **products table** in the database. +* The **add_all** method of the session object is used to add all the Product objects to the session. This method takes a **list of objects as an argument** and adds them to the session. +* The **commit** method of the session object is used to commit the changes made to the database. + +## Aggregation Functions + +SQLAlchemy provides functions that correspond to SQL aggregation functions and are available in the **sqlalchemy.func module**. + +### COUNT + +The **COUNT** function returns the number of rows in a result set. It can be demonstrated using the following code snippet: +```python +from sqlalchemy import func + +total_products = session.query(func.count(Product.id)).scalar() +print(f'Total products: {total_products}') +``` + +### SUM + +The **SUM** function returns the sum of all values in a column. It can be demonstrated using the following code snippet: +```python +total_price = session.query(func.sum(Product.price)).scalar() +print(f'Total price of all products: {total_price}') +``` + +### AVG + +The **AVG** function returns the average of all values in a column. It can be demonstrated by the following code snippet: +```python +average_price = session.query(func.avg(Product.price)).scalar() +print(f'Average price of products: {average_price}') +``` + +### MAX + +The **MAX** function returns the maximum value in a column. It can be demonstrated using the following code snippet : +```python +max_price = session.query(func.max(Product.price)).scalar() +print(f'Maximum price of products: {max_price}') +``` + +### MIN + +The **MIN** function returns the minimum value in a column. It can be demonstrated using the following code snippet: +```python +min_price = session.query(func.min(Product.price)).scalar() +print(f'Minimum price of products: {min_price}') +``` + +In general, the aggregation functions can be implemented by utilising the **session** object to execute the desired query on the table present in a database using the **query()** method. The **scalar()** method is called on the query object to execute the query and return a single value diff --git a/contrib/ds-algorithms/images/Time-And-Space-Complexity-BigOh.png b/contrib/ds-algorithms/images/Time-And-Space-Complexity-BigOh.png new file mode 100644 index 0000000..f748094 Binary files /dev/null and b/contrib/ds-algorithms/images/Time-And-Space-Complexity-BigOh.png differ diff --git a/contrib/ds-algorithms/images/Time-And-Space-Complexity-BigOmega.png b/contrib/ds-algorithms/images/Time-And-Space-Complexity-BigOmega.png new file mode 100644 index 0000000..b4faba1 Binary files /dev/null and b/contrib/ds-algorithms/images/Time-And-Space-Complexity-BigOmega.png differ diff --git a/contrib/ds-algorithms/images/Time-And-Space-Complexity-BigTheta.png b/contrib/ds-algorithms/images/Time-And-Space-Complexity-BigTheta.png new file mode 100644 index 0000000..7434906 Binary files /dev/null and b/contrib/ds-algorithms/images/Time-And-Space-Complexity-BigTheta.png differ diff --git a/contrib/ds-algorithms/index.md b/contrib/ds-algorithms/index.md index c61ca0b..31cff39 100644 --- a/contrib/ds-algorithms/index.md +++ b/contrib/ds-algorithms/index.md @@ -1,5 +1,6 @@ # List of sections +- [Time & Space Complexity](time-space-complexity.md) - [Queues in Python](Queues.md) - [Graphs](graph.md) - [Sorting Algorithms](sorting-algorithms.md) diff --git a/contrib/ds-algorithms/recursion.md b/contrib/ds-algorithms/recursion.md index 7ab3136..4233242 100644 --- a/contrib/ds-algorithms/recursion.md +++ b/contrib/ds-algorithms/recursion.md @@ -2,7 +2,7 @@ When a function calls itself to solve smaller instances of the same problem until a specified condition is fulfilled is called recursion. It is used for tasks that can be divided into smaller sub-tasks. -# How Recursion Works +## How Recursion Works To solve a problem using recursion we must define: - Base condition :- The condition under which recursion ends. @@ -17,43 +17,63 @@ When a recursive function is called, the following sequence of events occurs: - Stack Management: Each recursive call is placed on the call stack. The stack keeps track of each function call, its argument, and the point to return to once the call completes. - Unwinding the Stack: When the base case is eventually met, the function returns a value, and the stack starts unwinding, returning values to previous function calls until the initial call is resolved. -# What is Stack Overflow in Recursion +## Python Code: Factorial using Recursion + +```python +def fact(n): + if n == 0 or n == 1: + return 1 + return n * fact(n - 1) + +if __name__ == "__main__": + n = int(input("Enter a positive number: ")) + print("Factorial of", n, "is", fact(n)) +``` + +### Explanation + +This Python script calculates the factorial of a given number using recursion. + +- **Function `fact(n)`:** + - The function takes an integer `n` as input and calculates its factorial. + - It checks if `n` is 0 or 1. If so, it returns 1 (since the factorial of 0 and 1 is 1). + - Otherwise, it returns `n * fact(n - 1)`, which means it recursively calls itself with `n - 1` until it reaches either 0 or 1. + +- **Main Section:** + - The main section prompts the user to enter a positive number. + - It then calls the `fact` function with the input number and prints the result. + +#### Example : Let n = 4 + +The recursion unfolds as follows: +1. When `fact(4)` is called, it computes `4 * fact(3)`. +2. Inside `fact(3)`, it computes `3 * fact(2)`. +3. Inside `fact(2)`, it computes `2 * fact(1)`. +4. `fact(1)` returns 1 (`if` statement executes), which is received by `fact(2)`, resulting in `2 * 1` i.e. `2`. +5. Back to `fact(3)`, it receives the value from `fact(2)`, giving `3 * 2` i.e. `6`. +6. `fact(4)` receives the value from `fact(3)`, resulting in `4 * 6` i.e. `24`. +7. Finally, `fact(4)` returns 24 to the main function. + +#### So, the result is 24. + +#### What is Stack Overflow in Recursion? Stack overflow is an error that occurs when the call stack memory limit is exceeded. During execution of recursion calls they are simultaneously stored in a recursion stack waiting for the recursive function to be completed. Without a base case, the function would call itself indefinitely, leading to a stack overflow. -# Example - -- Factorial of a Number - - The factorial of i natural numbers is nth integer multiplied by factorial of (i-1) numbers. The base case is if i=0 we return 1 as factorial of 0 is 1. - -```python -def factorial(i): - #base case - if i==0 : - return 1 - #recursive case - else : - return i * factorial(i-1) -i = 6 -print("Factorial of i is :", factorial(i)) # Output- Factorial of i is :720 -``` -# What is Backtracking +## What is Backtracking Backtracking is a recursive algorithmic technique used to solve problems by exploring all possible solutions and discarding those that do not meet the problem's constraints. It is particularly useful for problems involving combinations, permutations, and finding paths in a grid. -# How Backtracking Works +## How Backtracking Works - Incremental Solution Building: Solutions are built one step at a time. - Feasibility Check: At each step, a check is made to see if the current partial solution is valid. - Backtracking: If a partial solution is found to be invalid, the algorithm backtracks by removing the last added part of the solution and trying the next possibility. - Exploration of All Possibilities: The process continues recursively, exploring all possible paths, until a solution is found or all possibilities are exhausted. -# Example +## Example: Word Search -- Word Search - - Given a 2D grid of characters and a word, determine if the word exists in the grid. The word can be constructed from letters of sequentially adjacent cells, where "adjacent" cells are horizontally or vertically neighboring. The same letter cell may not be used more than once. +Given a 2D grid of characters and a word, determine if the word exists in the grid. The word can be constructed from letters of sequentially adjacent cells, where "adjacent" cells are horizontally or vertically neighboring. The same letter cell may not be used more than once. Algorithm for Solving the Word Search Problem with Backtracking: - Start at each cell: Attempt to find the word starting from each cell. diff --git a/contrib/ds-algorithms/time-space-complexity.md b/contrib/ds-algorithms/time-space-complexity.md new file mode 100644 index 0000000..eeadd64 --- /dev/null +++ b/contrib/ds-algorithms/time-space-complexity.md @@ -0,0 +1,243 @@ +# Time and Space Complexity + +We can solve a problem using one or more algorithms. It's essential to learn how to compare the performance of different algorithms and select the best one for a specific task. + +Therefore, it is highly required to use a method to compare the solutions in order to judge which one is more optimal. + +The method must be: + +- Regardless of the system or its settings on which the algorithm is executing. +- Demonstrate a direct relationship with the quantity of inputs. +- Able to discriminate between two methods with clarity and precision. + +Two such methods use to analyze algorithms are `time complexity` and `space complexity`. + +## What is Time Complexity? + +The _number of operations an algorithm performs in proportion to the quantity of the input_ is measured by time complexity. It facilitates our investigation of how the performance of the algorithm scales with increasing input size. But in real life, **_time complexity does not refer to the time taken by the machine to execute a particular code_**. + +## Order of Growth and Asymptotic Notations + +The Order of Growth explains how an algorithm's space or running time expands as the amount of the input does. This increase is described via asymptotic language, such Big O notation, which concentrates on the dominating term as the input size approaches infinity and is independent of lower-order terms and machine-specific constants. + +### Common Asymptotic Notation + +1. `Big Oh (O)`: Provides the worst-case scenario for describing the upper bound of an algorithm's execution time. +2. `Big Omega (Ω)`: Provides the best-case scenario and describes the lower bound. +3. `Big Theta (Θ)`: Gives a tight constraint on the running time by describing both the upper and lower bounds. + +### 1. Big Oh (O) Notation + +Big O notation describes how an algorithm behaves as the input size gets closer to infinity and provides an upper bound on the time or space complexity of the method. It helps developers and computer scientists to evaluate the effectiveness of various algorithms without regard to the software or hardware environment. + +To denote asymptotic upper bound, we use O-notation. For a given function `g(n)`, we denote by `O(g(n))` (pronounced "big-oh of g of n") the set of functions: + +$$ +O(g(n)) = \{ f(n) : \exists \text{ positive constants } c \text{ and } n_0 \text{ such that } 0 \leq f(n) \leq c \cdot g(n) \text{ for all } n \geq n_0 \} +$$ + +Graphical representation of Big Oh: + + + +### 2. Big Omega (Ω) Notation + +Big Omega (Ω) notation is used to describe the lower bound of an algorithm's running time. It provides a way to express the minimum time complexity that an algorithm will take to complete. In other words, Big Omega gives us a guarantee that the algorithm will take at least a certain amount of time to run, regardless of other factors. + +To denote asymptotic lower bound, we use Omega-notation. For a given function `g(n)`, we denote by `Ω(g(n))` (pronounced "big-omega of g of n") the set of functions: + +$$ +\Omega(g(n)) = \{ f(n) : \exists \text{ positive constants } c \text{ and } n_0 \text{ such that } 0 \leq c \cdot g(n) \leq f(n) \text{ for all } n \geq n_0 \} +$$ + +Graphical representation of Big Omega: + + + +### 3. Big Theta (Θ) Notation + +Big Theta (Θ) notation provides a way to describe the asymptotic tight bound of an algorithm's running time. It offers a precise measure of the time complexity by establishing both an upper and lower bound, indicating that the running time of an algorithm grows at the same rate as a given function, up to constant factors. + +To denote asymptotic tight bound, we use Theta-notation. For a given function `g(n)`, we denote by `Θ(g(n))` (pronounced "big-theta of g of n") the set of functions: + +$$ +\Theta(g(n)) = \{ f(n) : \exists \text{ positive constants } c_1, c_2, \text{ and } n_0 \text{ such that } 0 \leq c_1 \cdot g(n) \leq f(n) \leq c_2 \cdot g(n) \text{ for all } n \geq n_0 \} +$$ + +Graphical representation of Big Theta: + + + +## Best Case, Worst Case and Average Case + +### 1. Best-Case Scenario: + +The best-case scenario refers to the situation where an algorithm performs optimally, achieving the lowest possible time or space complexity. It represents the most favorable conditions under which an algorithm operates. + +#### Characteristics: + +- Represents the minimum time or space required by an algorithm to solve a problem. +- Occurs when the input data is structured in such a way that the algorithm can exploit its strengths fully. +- Often used to analyze the lower bound of an algorithm's performance. + +#### Example: + +Consider the `linear search algorithm` where we're searching for a `target element` in an array. The best-case scenario occurs when the target element is found `at the very beginning of the array`. In this case, the algorithm would only need to make one comparison, resulting in a time complexity of `O(1)`. + +### 2. Worst-Case Scenario: + +The worst-case scenario refers to the situation where an algorithm performs at its poorest, achieving the highest possible time or space complexity. It represents the most unfavorable conditions under which an algorithm operates. + +#### Characteristics: + +- Represents the maximum time or space required by an algorithm to solve a problem. +- Occurs when the input data is structured in such a way that the algorithm encounters the most challenging conditions. +- Often used to analyze the upper bound of an algorithm's performance. + +#### Example: + +Continuing with the `linear search algorithm`, the worst-case scenario occurs when the `target element` is either not present in the array or located `at the very end`. In this case, the algorithm would need to iterate through the entire array, resulting in a time complexity of `O(n)`, where `n` is the size of the array. + +### 3. Average-Case Scenario: + +The average-case scenario refers to the expected performance of an algorithm over all possible inputs, typically calculated as the arithmetic mean of the time or space complexity. + +#### Characteristics: + +- Represents the typical performance of an algorithm across a range of input data. +- Takes into account the distribution of inputs and their likelihood of occurrence. +- Provides a more realistic measure of an algorithm's performance compared to the best-case or worst-case scenarios. + +#### Example: + +For the `linear search algorithm`, the average-case scenario considers the probability distribution of the target element's position within the array. If the `target element is equally likely to be found at any position in the array`, the average-case time complexity would be `O(n/2)`, as the algorithm would, on average, need to search halfway through the array. + +## Space Complexity + +The memory space that a code utilizes as it is being run is often referred to as space complexity. Additionally, space complexity depends on the machine, therefore rather than using the typical memory units like MB, GB, etc., we will express space complexity using the Big O notation. + +#### Examples of Space Complexity + +1. `Constant Space Complexity (O(1))`: Algorithms that operate on a fixed-size array or use a constant number of variables have O(1) space complexity. +2. `Linear Space Complexity (O(n))`: Algorithms that store each element of the input array in a separate variable or data structure have O(n) space complexity. +3. `Quadratic Space Complexity (O(n^2))`: Algorithms that create a two-dimensional array or matrix with dimensions based on the input size have O(n^2) space complexity. + +#### Analyzing Space Complexity + +To analyze space complexity: + +- Identify the variables, data structures, and recursive calls used by the algorithm. +- Determine how the space requirements scale with the input size. +- Express the space complexity using Big O notation, considering the dominant terms that contribute most to the overall space usage. + +## Examples to calculate time and space complexity + +#### 1. Print all elements of given array + +Consider each line takes one unit of time to run. So, to simply iterate over an array to print all elements it will take `O(n)` time, where n is the size of array. + +Code: + +```python +arr = [1,2,3,4] #1 +for x in arr: #2 + print(x) #3 +``` + +Here, the 1st statement executes only once. So, it takes one unit of time to run. The for loop consisting of 2nd and 3rd statements executes 4 times. +Also, as the code dosen't take any additional space except the input arr its Space Complexity is O(1) constant. + +#### 2. Linear Search + +Linear search is a simple algorithm for finding an element in an array by sequentially checking each element until a match is found or the end of the array is reached. Here's an example of calculating the time and space complexity of linear search: + +```python +def linear_search(arr, target): + for x in arr: # n iterations in worst case + if x == target: # 1 + return True # 1 + return False # If element not found + +# Example usage +arr = [1, 3, 5, 7, 9] +target = 5 +print(linear_search(arr, target)) +``` + +**Time Complexity Analysis** + +The for loop iterates through the entire array, which takes O(n) time in the worst case, where n is the size of the array. +Inside the loop, each operation takes constant time (O(1)). +Therefore, the time complexity of linear search is `O(n)`. + +**Space Complexity Analysis** + +The space complexity of linear search is `O(1)` since it only uses a constant amount of additional space for variables regardless of the input size. + + +#### 3. Binary Search + +Binary search is an efficient algorithm for finding an element in a sorted array by repeatedly dividing the search interval in half. Here's an example of calculating the time and space complexity of binary search: + +```python +def binary_search(arr, target): + left = 0 # 1 + right = len(arr) - 1 # 1 + + while left <= right: # log(n) iterations in worst case + mid = (left + right) // 2 # log(n) + + if arr[mid] == target: # 1 + return mid # 1 + elif arr[mid] < target: # 1 + left = mid + 1 # 1 + else: + right = mid - 1 # 1 + + return -1 # If element not found + +# Example usage +arr = [1, 3, 5, 7, 9] +target = 5 +print(binary_search(arr, target)) +``` + +**Time Complexity Analysis** + +The initialization of left and right takes constant time (O(1)). +The while loop runs for log(n) iterations in the worst case, where n is the size of the array. +Inside the loop, each operation takes constant time (O(1)). +Therefore, the time complexity of binary search is `O(log n)`. + +**Space Complexity Analysis** + +The space complexity of binary search is `O(1)` since it only uses a constant amount of additional space for variables regardless of the input size. + +#### 4. Fibbonaci Sequence + +Let's consider an example of a function that generates Fibonacci numbers up to a given index and stores them in a list. In this case, the space complexity will not be constant because the size of the list grows with the Fibonacci sequence. + +```python +def fibonacci_sequence(n): + fib_list = [0, 1] # Initial Fibonacci sequence with first two numbers + + while len(fib_list) < n: # O(n) iterations in worst case + next_fib = fib_list[-1] + fib_list[-2] # Calculating next Fibonacci number + fib_list.append(next_fib) # Appending next Fibonacci number to list + + return fib_list + +# Example usage +n = 10 +fib_sequence = fibonacci_sequence(n) +print(fib_sequence) +``` + +**Time Complexity Analysis** + +The while loop iterates until the length of the Fibonacci sequence list reaches n, so it takes `O(n)` iterations in the `worst case`.Inside the loop, each operation takes constant time (O(1)). + +**Space Complexity Analysis** + +The space complexity of this function is not constant because it creates and stores a list of Fibonacci numbers. +As n grows, the size of the list also grows, so the space complexity is O(n), where n is the index of the last Fibonacci number generated. diff --git a/contrib/machine-learning/assets/cnn-dropout.png b/contrib/machine-learning/assets/cnn-dropout.png new file mode 100644 index 0000000..9cb18f9 Binary files /dev/null and b/contrib/machine-learning/assets/cnn-dropout.png differ diff --git a/contrib/machine-learning/assets/cnn-filters.png b/contrib/machine-learning/assets/cnn-filters.png new file mode 100644 index 0000000..463ca60 Binary files /dev/null and b/contrib/machine-learning/assets/cnn-filters.png differ diff --git a/contrib/machine-learning/assets/cnn-flattened.png b/contrib/machine-learning/assets/cnn-flattened.png new file mode 100644 index 0000000..2d1ca6f Binary files /dev/null and b/contrib/machine-learning/assets/cnn-flattened.png differ diff --git a/contrib/machine-learning/assets/cnn-input_shape.png b/contrib/machine-learning/assets/cnn-input_shape.png new file mode 100644 index 0000000..34379f1 Binary files /dev/null and b/contrib/machine-learning/assets/cnn-input_shape.png differ diff --git a/contrib/machine-learning/assets/cnn-ouputs.png b/contrib/machine-learning/assets/cnn-ouputs.png new file mode 100644 index 0000000..2797226 Binary files /dev/null and b/contrib/machine-learning/assets/cnn-ouputs.png differ diff --git a/contrib/machine-learning/assets/cnn-padding.png b/contrib/machine-learning/assets/cnn-padding.png new file mode 100644 index 0000000..a441b2b Binary files /dev/null and b/contrib/machine-learning/assets/cnn-padding.png differ diff --git a/contrib/machine-learning/assets/cnn-pooling.png b/contrib/machine-learning/assets/cnn-pooling.png new file mode 100644 index 0000000..c3ada5c Binary files /dev/null and b/contrib/machine-learning/assets/cnn-pooling.png differ diff --git a/contrib/machine-learning/assets/cnn-strides.png b/contrib/machine-learning/assets/cnn-strides.png new file mode 100644 index 0000000..26339a9 Binary files /dev/null and b/contrib/machine-learning/assets/cnn-strides.png differ diff --git a/contrib/machine-learning/clustering.md b/contrib/machine-learning/clustering.md new file mode 100644 index 0000000..bc02d37 --- /dev/null +++ b/contrib/machine-learning/clustering.md @@ -0,0 +1,96 @@ +# Clustering + +Clustering is an unsupervised machine learning technique that groups a set of objects in such a way that objects in the same group (called a cluster) are more similar to each other than to those in other groups (clusters). This README provides an overview of clustering, including its fundamental concepts, types, algorithms, and how to implement it using Python. + +## Introduction + +Clustering is a technique used to find inherent groupings within data without pre-labeled targets. It is widely used in exploratory data analysis, pattern recognition, image analysis, information retrieval, and bioinformatics. + +## Concepts + +### Centroid + +A centroid is the center of a cluster. In the k-means clustering algorithm, for example, each cluster is represented by its centroid, which is the mean of all the data points in the cluster. + +### Distance Measure + +Distance measures are used to quantify the similarity or dissimilarity between data points. Common distance measures include Euclidean distance, Manhattan distance, and cosine similarity. + +### Inertia + +Inertia is a metric used to assess the quality of the clusters formed. It is the sum of squared distances of samples to their nearest cluster center. + +## Types of Clustering + +1. **Hard Clustering**: Each data point either belongs to a cluster completely or not at all. +2. **Soft Clustering (Fuzzy Clustering)**: Each data point can belong to multiple clusters with varying degrees of membership. + +## Clustering Algorithms + +### K-Means Clustering + +K-Means is a popular clustering algorithm that partitions the data into k clusters, where each data point belongs to the cluster with the nearest mean. The algorithm follows these steps: +1. Initialize k centroids randomly. +2. Assign each data point to the nearest centroid. +3. Recalculate the centroids as the mean of all data points assigned to each cluster. +4. Repeat steps 2 and 3 until convergence. + +### Hierarchical Clustering + +Hierarchical clustering builds a tree of clusters. There are two types: +- **Agglomerative (bottom-up)**: Starts with each data point as a separate cluster and merges the closest pairs of clusters iteratively. +- **Divisive (top-down)**: Starts with all data points in one cluster and splits the cluster iteratively into smaller clusters. + +### DBSCAN (Density-Based Spatial Clustering of Applications with Noise) + +DBSCAN groups together points that are close to each other based on a distance measurement and a minimum number of points. It can find arbitrarily shaped clusters and is robust to noise. + +## Implementation + +### Using Scikit-learn + +Scikit-learn is a popular machine learning library in Python that provides tools for clustering. + +### Code Example + +```python +import numpy as np +import pandas as pd +from sklearn.cluster import KMeans +from sklearn.preprocessing import StandardScaler +from sklearn.metrics import silhouette_score + +# Load dataset +data = pd.read_csv('path/to/your/dataset.csv') + +# Preprocess the data +scaler = StandardScaler() +data_scaled = scaler.fit_transform(data) + +# Initialize and fit KMeans model +kmeans = KMeans(n_clusters=3, random_state=42) +kmeans.fit(data_scaled) + +# Get cluster labels +labels = kmeans.labels_ + +# Calculate silhouette score +silhouette_avg = silhouette_score(data_scaled, labels) +print("Silhouette Score:", silhouette_avg) + +# Add cluster labels to the original data +data['Cluster'] = labels + +print(data.head()) +``` + +## Evaluation Metrics + +- **Silhouette Score**: Measures how similar a data point is to its own cluster compared to other clusters. +- **Inertia (Within-cluster Sum of Squares)**: Measures the compactness of the clusters. +- **Davies-Bouldin Index**: Measures the average similarity ratio of each cluster with the cluster that is most similar to it. +- **Dunn Index**: Ratio of the minimum inter-cluster distance to the maximum intra-cluster distance. + +## Conclusion + +Clustering is a powerful technique for discovering structure in data. Understanding different clustering algorithms and their evaluation metrics is crucial for selecting the appropriate method for a given problem. diff --git a/contrib/machine-learning/grid-search.md b/contrib/machine-learning/grid-search.md new file mode 100644 index 0000000..ae44412 --- /dev/null +++ b/contrib/machine-learning/grid-search.md @@ -0,0 +1,71 @@ +# Grid Search + +Grid Search is a hyperparameter tuning technique in Machine Learning that helps to find the best combination of hyperparameters for a given model. It works by defining a grid of hyperparameters and then training the model with all the possible combinations of hyperparameters to find the best performing set. + +The Grid Search Method considers some hyperparameter combinations and selects the one returning a lower error score. This method is specifically useful when there are only some hyperparameters in order to optimize. However, it is outperformed by other weighted-random search methods when the Machine Learning model grows in complexity. + +## Implementation + +Before applying Grid Searching on any algorithm, data is divided into training and validation set, a validation set is used to validate the models. A model with all possible combinations of hyperparameters is tested on the validation set to choose the best combination. + +Grid Searching can be applied to any hyperparameters algorithm whose performance can be improved by tuning hyperparameter. For example, we can apply grid searching on K-Nearest Neighbors by validating its performance on a set of values of K in it. Same thing we can do with Logistic Regression by using a set of values of learning rate to find the best learning rate at which Logistic Regression achieves the best accuracy. + +Let us consider that the model accepts the below three parameters in the form of input: +1. Number of hidden layers `[2, 4]` +2. Number of neurons in every layer `[5, 10]` +3. Number of epochs `[10, 50]` + +If we want to try out two options for every parameter input (as specified in square brackets above), it estimates different combinations. For instance, one possible combination can be `[2, 5, 10]`. Finding such combinations manually would be a headache. + +Now, suppose that we had ten different parameters as input, and we would like to try out five possible values for each and every parameter. It would need manual input from the programmer's end every time we like to alter the value of a parameter, re-execute the code, and keep a record of the outputs for every combination of the parameters. + +Grid Search automates that process, as it accepts the possible value for every parameter and executes the code in order to try out each and every possible combination outputs the result for the combinations and outputs the combination having the best accuracy. + +Higher values of C tell the model, the training data resembles real world information, place a greater weight on the training data. While lower values of C do the opposite. + +## Explaination of the Code + +The code provided performs hyperparameter tuning for a Logistic Regression model using a manual grid search approach. It evaluates the model's performance for different values of the regularization strength hyperparameter C on the Iris dataset. +1. datasets from sklearn is imported to load the Iris dataset. +2. LogisticRegression from sklearn.linear_model is imported to create and fit the logistic regression model. +3. The Iris dataset is loaded, with X containing the features and y containing the target labels. +4. A LogisticRegression model is instantiated with max_iter=10000 to ensure convergence during the fitting process, as the default maximum iterations (100) might not be sufficient. +5. A list of different values for the regularization strength C is defined. The hyperparameter C controls the regularization strength, with smaller values specifying stronger regularization. +6. An empty list scores is initialized to store the model's performance scores for different values of C. +7. A for loop iterates over each value in the C list: +8. logit.set_params(C=choice) sets the C parameter of the logistic regression model to the current value in the loop. +9. logit.fit(X, y) fits the logistic regression model to the entire Iris dataset (this is typically done on training data in a real scenario, not the entire dataset). +10. logit.score(X, y) calculates the accuracy of the fitted model on the dataset and appends this score to the scores list. +11. After the loop, the scores list is printed, showing the accuracy for each value of C. + +### Python Code + +```python +from sklearn import datasets +from sklearn.linear_model import LogisticRegression + +iris = datasets.load_iris() +X = iris['data'] +y = iris['target'] + +logit = LogisticRegression(max_iter = 10000) + +C = [0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2] + +scores = [] +for choice in C: + logit.set_params(C=choice) + logit.fit(X, y) + scores.append(logit.score(X, y)) +print(scores) +``` + +#### Results + +``` +[0.9666666666666667, 0.9666666666666667, 0.9733333333333334, 0.9733333333333334, 0.98, 0.98, 0.9866666666666667, 0.9866666666666667] +``` + +We can see that the lower values of `C` performed worse than the base parameter of `1`. However, as we increased the value of `C` to `1.75` the model experienced increased accuracy. + +It seems that increasing `C` beyond this amount does not help increase model accuracy. diff --git a/contrib/machine-learning/index.md b/contrib/machine-learning/index.md index 3b8c95b..e56691d 100644 --- a/contrib/machine-learning/index.md +++ b/contrib/machine-learning/index.md @@ -4,9 +4,13 @@ - [Regression in Machine Learning](Regression.md) - [Confusion Matrix](confusion-matrix.md) - [Decision Tree Learning](Decision-Tree.md) +- [Random Forest](random-forest.md) - [Support Vector Machine Algorithm](support-vector-machine.md) - [Artificial Neural Network from the Ground Up](ArtificialNeuralNetwork.md) +- [Introduction To Convolutional Neural Networks (CNNs)](intro-to-cnn.md) - [TensorFlow.md](tensorFlow.md) - [PyTorch.md](pytorch.md) - [Types of optimizers](Types_of_optimizers.md) -- [Random Forest](random-forest.md) +- [Logistic Regression](logistic-regression.md) +- [Clustering](clustering.md) +- [Grid Search](grid-search.md) diff --git a/contrib/machine-learning/intro-to-cnn.md b/contrib/machine-learning/intro-to-cnn.md new file mode 100644 index 0000000..0221ca1 --- /dev/null +++ b/contrib/machine-learning/intro-to-cnn.md @@ -0,0 +1,225 @@ +# Understanding Convolutional Neural Networks (CNN) + +## Introduction +Convolutional Neural Networks (CNNs) are a specialized type of artificial neural network designed primarily for processing structured grid data like images. CNNs are particularly powerful for tasks involving image recognition, classification, and computer vision. They have revolutionized these fields, outperforming traditional neural networks by leveraging their unique architecture to capture spatial hierarchies in images. + +### Why CNNs are Superior to Traditional Neural Networks +1. **Localized Receptive Fields**: CNNs use convolutional layers that apply filters to local regions of the input image. This localized connectivity ensures that the network learns spatial hierarchies and patterns, such as edges and textures, which are essential for image recognition tasks. +2. **Parameter Sharing**: In CNNs, the same filter (set of weights) is used across different parts of the input, significantly reducing the number of parameters compared to fully connected layers in traditional neural networks. This not only lowers the computational cost but also mitigates the risk of overfitting. +3. **Translation Invariance**: Due to the shared weights and pooling operations, CNNs are inherently invariant to translations of the input image. This means that they can recognize objects even when they appear in different locations within the image. +4. **Hierarchical Feature Learning**: CNNs automatically learn a hierarchy of features from low-level features like edges to high-level features like shapes and objects. Traditional neural networks, on the other hand, require manual feature extraction which is less effective and more time-consuming. + +### Use Cases of CNNs +- **Image Classification**: Identifying objects within an image (e.g., classifying a picture as containing a cat or a dog). +- **Object Detection**: Detecting and locating objects within an image (e.g., finding faces in a photo). +- **Image Segmentation**: Partitioning an image into segments or regions (e.g., dividing an image into different objects and background). +- **Medical Imaging**: Analyzing medical scans like MRI, CT, and X-rays for diagnosis. + +> This guide will walk you through the fundamentals of CNNs and their implementation in Python. We'll build a simple CNN from scratch, explaining each component to help you understand how CNNs process images and extract features. + +### Let's start by understanding the basic architecture of CNNs. + +## CNN Architecture +Convolution layers, pooling layers, and fully connected layers are just a few of the many building blocks that CNNs use to automatically and adaptively learn spatial hierarchies of information through backpropagation. + +### Convolutional Layer +The convolutional layer is the core building block of a CNN. The layer's parameters consist of a set of learnable filters (or kernels), which have a small receptive field but extend through the full depth of the input volume. + +#### Input Shape +The dimensions of the input image, including the number of channels (e.g., 3 for RGB images & 1 for Grayscale images). + + +- The input matrix is a binary image of handwritten digits, +where '1' marks the pixels containing the digit (ink/grayscale area) and '0' marks the background pixels (empty space). +- The first matrix shows the represnetation of 1 and 0, which can be depicted as a vertical line and a closed loop. +- The second matrix represents 9, combining the loop and line. + +#### Strides +The step size with which the filter moves across the input image. + + +- This visualization will help you understand how the filter (kernel) moves acroos the input matrix with stride values of (3,3) and (2,2). +- A stride of 1 means the filter moves one step at a time, ensuring it covers the entire input matrix. +- However, with larger strides (like 3 or 2 in this example), the filter may not cover all elements, potentially missing some information. +- While this might seem like a drawback, higher strides are often used to reduce computational cost and decrease the output size, which can be beneficial in speeding up the training process and preventing overfitting. + +#### Padding +Determines whether the output size is the same as the input size ('same') or reduced ('valid'). + + +- `Same` padding is preferred in earlier layers to preserve spatial and edge information, as it can help the network learn more detailed features. +- Choose `valid` padding when focusing on the central input region or requiring specific output dimensions. +- Padding value can be determined by $ ( f - 1 ) \over 2 $, where f isfilter size + +#### Filters +Small matrices that slide over the input data to extract features. + + +- The first filter aims to detect closed loops within the input image, being highly relevant for recognizing digits with circular or oval shapes, such as '0', '6', '8', or '9'. +- The next filter helps in detecting vertical lines, crucial for identifying digits like '1', '4', '7', and parts of other digits that contain vertical strokes. +- The last filter shows how to detect diagonal lines in the input image, useful for identifying the slashes present in digits like '1', '7', or parts of '4' and '9'. + +#### Output +A set of feature maps that represent the presence of different features in the input. + + +- With no padding and a stride of 1, the 3x3 filter moves one step at a time across the 7x5 input matrix. The filter can only move within the original boundaries of the input, resulting in a smaller 5x3 output matrix. This configuration is useful when you want to reduce the spatial dimensions of the feature map while preserving the exact spatial relationships between features. +- By adding zero padding to the input matrix, it is expanded to 9x7, allowing the 3x3 filter to "fit" fully on the edges and corners. With a stride of 1, the filter still moves one step at a time, but now the output matrix is the same size (7x5) as the original input. Same padding is often preferred in early layers of a CNN to preserve spatial information and avoid rapid feature map shrinkage. +- Without padding, the 3x3 filter operates within the original input matrix boundaries, but now it moves two steps at a time (stride 2). This significantly reduces the output matrix size to 3x2. Larger strides are employed to decrease computational cost and the output size, which can be beneficial in speeding up the training process and preventing overfitting. However, they might miss some finer details due to the larger jumps. +- The output dimension of a CNN model is given by, $$ n_{out} = { n_{in} + (2 \cdot p) - k \over s } $$ +where, + nin = number of input features + p = padding + k = kernel size + s = stride + +- Also, the number of trainable parameters for each layer is given by, $ (n_c \cdot [k \cdot k] \cdot f) + f $ +where, + nc = number of input channels + k x k = kernel size + f = number of filters + an additional f is added for bias + +### Pooling Layer +Pooling layers reduce the dimensionality of each feature map while retaining the most critical information. The most common form of pooling is max pooling. +- **Input Shape:** The dimensions of the feature map from the convolutional layer. +- **Pooling Size:** The size of the pooling window (e.g., 2x2). +- **Strides:** The step size for the pooling operation. +- **Output:** A reduced feature map highlighting the most important features. +