Merge branch 'main' into main

pull/711/head
Ankit Mahato 2024-06-02 03:12:19 +05:30 zatwierdzone przez GitHub
commit 9e56fb1ca4
Nie znaleziono w bazie danych klucza dla tego podpisu
ID klucza GPG: B5690EEEBB952194
40 zmienionych plików z 1908 dodań i 29 usunięć

Wyświetl plik

@ -24,8 +24,8 @@ The list of topics for which we are looking for content are provided below along
- Web Scrapping - [Link](https://github.com/animator/learn-python/tree/main/contrib/web-scrapping)
- API Development - [Link](https://github.com/animator/learn-python/tree/main/contrib/api-development)
- Data Structures & Algorithms - [Link](https://github.com/animator/learn-python/tree/main/contrib/ds-algorithms)
- Python Mini Projects - [Link](https://github.com/animator/learn-python/tree/main/contrib/mini-projects)
- Python Question Bank - [Link](https://github.com/animator/learn-python/tree/main/contrib/question-bank)
- Python Mini Projects - [Link](https://github.com/animator/learn-python/tree/main/contrib/mini-projects) **(Not accepting)**
- Python Question Bank - [Link](https://github.com/animator/learn-python/tree/main/contrib/question-bank) **(Not accepting)**
You can check out some content ideas below.

Wyświetl plik

@ -0,0 +1,192 @@
# Exception Handling in Python
Exception Handling is a way of managing the errors that may occur during a program execution. Python's exception handling mechanism has been designed to avoid the unexpected termination of the program, and offer to either regain control after an error or display a meaningful message to the user.
- **Error** - An error is a mistake or an incorrect result produced by a program. It can be a syntax error, a logical error, or a runtime error. Errors are typically fatal, meaning they prevent the program from continuing to execute.
- **Exception** - An exception is an event that occurs during the execution of a program that disrupts the normal flow of instructions. Exceptions are typically unexpected and can be handled by the program to prevent it from crashing or terminating abnormally. It can be runtime, input/output or system exceptions. Exceptions are designed to be handled by the program, allowing it to recover from the error and continue executing.
## Python Built-in Exceptions
There are plenty of built-in exceptions in Python that are raised when a corresponding error occur.
We can view all the built-in exceptions using the built-in `local()` function as follows:
```python
print(dir(locals()['__builtins__']))
```
|**S.No**|**Exception**|**Description**|
|---|---|---|
|1|SyntaxError|A syntax error occurs when the code we write violates the grammatical rules such as misspelled keywords, missing colon, mismatched parentheses etc.|
|2|TypeError|A type error occurs when we try to perform an operation or use a function with objects that are of incompatible data types.|
|3|NameError|A name error occurs when we try to use a variable, function, module or string without quotes that hasn't been defined or isn't used in a valid way.|
|4|IndexError|A index error occurs when we try to access an element in a sequence (like a list, tuple or string) using an index that's outside the valid range of indices for that sequence.|
|5|KeyError|A key error occurs when we try to access a key that doesn't exist in a dictionary. Attempting to retrieve a value using a non-existent key results this error.|
|6|ValueError|A value error occurs when we provide an argument or value that's inappropriate for a specific operation or function such as doing mathematical operations with incompatible types (e.g., dividing a string by an integer.)|
|7|AttributeError|An attribute error occurs when we try to access an attribute (like a variable or method) on an object that doesn't possess that attribute.|
|8|IOError|An IO (Input/Output) error occurs when an operation involving file or device interaction fails. It signifies that there's an issue during communication between your program and the external system.|
|9|ZeroDivisionError|A ZeroDivisionError occurs when we attempt to divide a number by zero. This operation is mathematically undefined, and Python raises this error to prevent nonsensical results.|
|10|ImportError|An import error occurs when we try to use a module or library that Python can't find or import succesfully.|
## Try and Except Statement - Catching Exception
The `try-except` statement allows us to anticipate potential errors during program execution and define what actions to take when those errors occur. This prevents the program from crashing unexpectedly and makes it more robust.
Here's an example to explain this:
```python
try:
# Code that might raise an exception
result = 10 / 0
except:
print("An error occured!")
```
Output
```markdown
An error occured!
```
In this example, the `try` block contains the code that you suspect might raise an exception. Python attempts to execute the code within this block. If an exception occurs, Python jumps to the `except` block and executes the code within it.
## Specific Exception Handling
You can specify the type of expection you want to catch using the `except` keyword followed by the exception class name. You can also have multiple `except` blocks to handle different exception types.
Here's an example:
```python
try:
# Code that might raise ZeroDivisionError or NameError
result = 10 / 0
name = undefined_variable
except ZeroDivisionError:
print("Oops! You tried to divide by zero.")
except NameError:
print("There's a variable named 'undefined_variable' that hasn't been defined yet.")
```
Output
```markdown
Oops! You tried to divide by zero.
```
If you comment on the line `result = 10 / 0`, then the output will be:
```markdown
There's a variable named 'undefined_variable' that hasn't been defined yet.
```
## Important Note
In this code, the `except` block are specific to each type of expection. If you want to catch both exceptions with a single `except` block, you can use of tuple of exceptions, like this:
```python
try:
# Code that might raise ZeroDivisionError or NameError
result = 10 / 0
name = undefined_variable
except (ZeroDivisionError, NameError):
print("An error occured!")
```
Output
```markdown
An error occured!
```
## Try with Else Clause
The `else` clause in a Python `try-except` block provides a way to execute code only when the `try` block succeeds without raising any exceptions. It's like having a section of code that runs exclusively under the condition that no errors occur during the main operation in the `try` block.
Here's an example to understand this:
```python
def calculate_average(numbers):
if len(numbers) == 0: # Handle empty list case seperately (optional)
return None
try:
total = sum(numbers)
average = total / len(numbers)
except ZeroDivisionError:
print("Cannot calculate average for a list containing zero.")
else:
print("The average is:", average)
return average #Optionally return the average here
# Example usage
numbers = [10, 20, 30]
result = calculate_average(numbers)
if result is not None: # Check if result is available (handles empty list case)
print("Calculation succesfull!")
```
Output
```markdown
The average is: 20.0
```
## Finally Keyword in Python
The `finally` keyword in Python is used within `try-except` statements to execute a block of code **always**, regardless of whether an exception occurs in the `try` block or not.
To understand this, let us take an example:
```python
try:
a = 10 // 0
print(a)
except ZeroDivisionError:
print("Cannot be divided by zero.")
finally:
print("Program executed!")
```
Output
```markdown
Cannot be divided by zero.
Program executed!
```
## Raise Keyword in Python
In Python, raising an exception allows you to signal that an error condition has occured during your program's execution. The `raise` keyword is used to explicity raise an exception.
Let us take an example:
```python
def divide(x, y):
if y == 0:
raise ZeroDivisionError("Can't divide by zero!") # Raise an exception with a message
result = x / y
return result
try:
division_result = divide(10, 0)
print("Result:", division_result)
except ZeroDivisionError as e:
print("An error occured:", e) # Handle the exception and print the message
```
Output
```markdown
An error occured: Can't divide by zero!
```
## Advantages of Exception Handling
- **Improved Error Handling** - It allows you to gracefully handle unexpected situations that arise during program execution. Instead of crashing abruptly, you can define specific actions to take when exceptions occur, providing a smoother experience.
- **Code Robustness** - Exception Handling helps you to write more resilient programs by anticipating potential issues and providing approriate responses.
- **Enhanced Code Readability** - By seperating error handling logic from the core program flow, your code becomes more readable and easier to understand. The `try-except` blocks clearly indicate where potential errors might occur and how they'll be addressed.
## Disadvantages of Exception Handling
- **Hiding Logic Errors** - Relying solely on exception handling might mask underlying logic error in your code. It's essential to write clear and well-tested logic to minimize the need for excessive exception handling.
- **Performance Overhead** - In some cases, using `try-except` blocks can introduce a slight performance overhead compared to code without exception handling. Howerer, this is usually negligible for most applications.
- **Overuse of Exceptions** - Overusing exceptions for common errors or control flow can make code less readable and harder to maintain. It's important to use exceptions judiciously for unexpected situations.

Wyświetl plik

@ -7,3 +7,4 @@
- [Regular Expressions in Python](regular_expressions.md)
- [JSON module](json-module.md)
- [Map Function](map-function.md)
- [Exception Handling in Python](exception-handling.md)

Wyświetl plik

@ -1,3 +1,4 @@
# List of sections
- [Introduction to MySQL and Queries](intro_mysql_queries.md)
- [SQLAlchemy and Aggregation Functions](sqlalchemy-aggregation.md)

Wyświetl plik

@ -0,0 +1,123 @@
# SQLAlchemy
SQLAlchemy is a powerful and flexible SQL toolkit and Object-Relational Mapping (ORM) library for Python. It is a versatile library that bridges the gap between Python applications and relational databases.
SQLAlchemy allows the user to write database-agnostic code that can work with a variety of relational databases such as SQLite, MySQL, PostgreSQL, Oracle, and Microsoft SQL Server. The ORM layer in SQLAlchemy allows developers to map Python classes to database tables. This means you can interact with your database using Python objects instead of writing raw SQL queries.
## Setting up the Environment
* Python and MySQL Server must be installed and configured.
* The library: **mysql-connector-python** and **sqlalchemy** must be installed.
```bash
pip install sqlalchemy mysql-connector-python
```
* If not installed, you can install them using the above command in terminal,
## Establishing Connection with Database
* Create a connection with the database using the following code snippet:
```python
from sqlalchemy import create_engine
from sqlalchemy.orm import declarative_base
from sqlalchemy.orm import sessionmaker
DATABASE_URL = 'mysql+mysqlconnector://root:12345@localhost/gssoc'
engine = create_engine(DATABASE_URL)
Session = sessionmaker(bind=engine)
session = Session()
Base = declarative_base()
```
* The connection string **DATABASE_URL** is passed as an argument to **create_engine** function which is used to create a connection to the database. This connection string contains the database credentials such as the database type, username, password, and database name.
* The **sessionmaker** function is used to create a session object which is used to interact with the database
* The **declarative_base** function is used to create a base class for all the database models. This base class is used to define the structure of the database tables.
## Creating Tables
* The following code snippet creates a table named **"products"** in the database:
```python
from sqlalchemy import Column, Integer, String, Float
class Product(Base):
__tablename__ = 'products'
id = Column(Integer, primary_key=True)
name = Column(String(50))
category = Column(String(50))
price = Column(Float)
quantity = Column(Integer)
Base.metadata.create_all(engine)
```
* The **Product class** inherits from **Base**, which is a base class for all the database models.
* The **Base.metadata.create_all(engine)** statement is used to create the table in the database. The engine object is a connection to the database that was created earlier.
## Inserting Data for Aggregation Functions
* The following code snippet inserts data into the **"products"** table:
```python
products = [
Product(name='Laptop', category='Electronics', price=1000, quantity=50),
Product(name='Smartphone', category='Electronics', price=700, quantity=150),
Product(name='Tablet', category='Electronics', price=400, quantity=100),
Product(name='Headphones', category='Accessories', price=100, quantity=200),
Product(name='Charger', category='Accessories', price=20, quantity=300),
]
session.add_all(products)
session.commit()
```
* A list of **Product** objects is created. Each Product object represents a row in the **products table** in the database.
* The **add_all** method of the session object is used to add all the Product objects to the session. This method takes a **list of objects as an argument** and adds them to the session.
* The **commit** method of the session object is used to commit the changes made to the database.
## Aggregation Functions
SQLAlchemy provides functions that correspond to SQL aggregation functions and are available in the **sqlalchemy.func module**.
### COUNT
The **COUNT** function returns the number of rows in a result set. It can be demonstrated using the following code snippet:
```python
from sqlalchemy import func
total_products = session.query(func.count(Product.id)).scalar()
print(f'Total products: {total_products}')
```
### SUM
The **SUM** function returns the sum of all values in a column. It can be demonstrated using the following code snippet:
```python
total_price = session.query(func.sum(Product.price)).scalar()
print(f'Total price of all products: {total_price}')
```
### AVG
The **AVG** function returns the average of all values in a column. It can be demonstrated by the following code snippet:
```python
average_price = session.query(func.avg(Product.price)).scalar()
print(f'Average price of products: {average_price}')
```
### MAX
The **MAX** function returns the maximum value in a column. It can be demonstrated using the following code snippet :
```python
max_price = session.query(func.max(Product.price)).scalar()
print(f'Maximum price of products: {max_price}')
```
### MIN
The **MIN** function returns the minimum value in a column. It can be demonstrated using the following code snippet:
```python
min_price = session.query(func.min(Product.price)).scalar()
print(f'Minimum price of products: {min_price}')
```
In general, the aggregation functions can be implemented by utilising the **session** object to execute the desired query on the table present in a database using the **query()** method. The **scalar()** method is called on the query object to execute the query and return a single value

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 13 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 9.2 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 13 KiB

Wyświetl plik

@ -1,5 +1,6 @@
# List of sections
- [Time & Space Complexity](time-space-complexity.md)
- [Queues in Python](Queues.md)
- [Graphs](graph.md)
- [Sorting Algorithms](sorting-algorithms.md)

Wyświetl plik

@ -2,7 +2,7 @@
When a function calls itself to solve smaller instances of the same problem until a specified condition is fulfilled is called recursion. It is used for tasks that can be divided into smaller sub-tasks.
# How Recursion Works
## How Recursion Works
To solve a problem using recursion we must define:
- Base condition :- The condition under which recursion ends.
@ -17,43 +17,63 @@ When a recursive function is called, the following sequence of events occurs:
- Stack Management: Each recursive call is placed on the call stack. The stack keeps track of each function call, its argument, and the point to return to once the call completes.
- Unwinding the Stack: When the base case is eventually met, the function returns a value, and the stack starts unwinding, returning values to previous function calls until the initial call is resolved.
# What is Stack Overflow in Recursion
## Python Code: Factorial using Recursion
```python
def fact(n):
if n == 0 or n == 1:
return 1
return n * fact(n - 1)
if __name__ == "__main__":
n = int(input("Enter a positive number: "))
print("Factorial of", n, "is", fact(n))
```
### Explanation
This Python script calculates the factorial of a given number using recursion.
- **Function `fact(n)`:**
- The function takes an integer `n` as input and calculates its factorial.
- It checks if `n` is 0 or 1. If so, it returns 1 (since the factorial of 0 and 1 is 1).
- Otherwise, it returns `n * fact(n - 1)`, which means it recursively calls itself with `n - 1` until it reaches either 0 or 1.
- **Main Section:**
- The main section prompts the user to enter a positive number.
- It then calls the `fact` function with the input number and prints the result.
#### Example : Let n = 4
The recursion unfolds as follows:
1. When `fact(4)` is called, it computes `4 * fact(3)`.
2. Inside `fact(3)`, it computes `3 * fact(2)`.
3. Inside `fact(2)`, it computes `2 * fact(1)`.
4. `fact(1)` returns 1 (`if` statement executes), which is received by `fact(2)`, resulting in `2 * 1` i.e. `2`.
5. Back to `fact(3)`, it receives the value from `fact(2)`, giving `3 * 2` i.e. `6`.
6. `fact(4)` receives the value from `fact(3)`, resulting in `4 * 6` i.e. `24`.
7. Finally, `fact(4)` returns 24 to the main function.
#### So, the result is 24.
#### What is Stack Overflow in Recursion?
Stack overflow is an error that occurs when the call stack memory limit is exceeded. During execution of recursion calls they are simultaneously stored in a recursion stack waiting for the recursive function to be completed. Without a base case, the function would call itself indefinitely, leading to a stack overflow.
# Example
- Factorial of a Number
The factorial of i natural numbers is nth integer multiplied by factorial of (i-1) numbers. The base case is if i=0 we return 1 as factorial of 0 is 1.
```python
def factorial(i):
#base case
if i==0 :
return 1
#recursive case
else :
return i * factorial(i-1)
i = 6
print("Factorial of i is :", factorial(i)) # Output- Factorial of i is :720
```
# What is Backtracking
## What is Backtracking
Backtracking is a recursive algorithmic technique used to solve problems by exploring all possible solutions and discarding those that do not meet the problem's constraints. It is particularly useful for problems involving combinations, permutations, and finding paths in a grid.
# How Backtracking Works
## How Backtracking Works
- Incremental Solution Building: Solutions are built one step at a time.
- Feasibility Check: At each step, a check is made to see if the current partial solution is valid.
- Backtracking: If a partial solution is found to be invalid, the algorithm backtracks by removing the last added part of the solution and trying the next possibility.
- Exploration of All Possibilities: The process continues recursively, exploring all possible paths, until a solution is found or all possibilities are exhausted.
# Example
## Example: Word Search
- Word Search
Given a 2D grid of characters and a word, determine if the word exists in the grid. The word can be constructed from letters of sequentially adjacent cells, where "adjacent" cells are horizontally or vertically neighboring. The same letter cell may not be used more than once.
Given a 2D grid of characters and a word, determine if the word exists in the grid. The word can be constructed from letters of sequentially adjacent cells, where "adjacent" cells are horizontally or vertically neighboring. The same letter cell may not be used more than once.
Algorithm for Solving the Word Search Problem with Backtracking:
- Start at each cell: Attempt to find the word starting from each cell.

Wyświetl plik

@ -0,0 +1,243 @@
# Time and Space Complexity
We can solve a problem using one or more algorithms. It's essential to learn how to compare the performance of different algorithms and select the best one for a specific task.
Therefore, it is highly required to use a method to compare the solutions in order to judge which one is more optimal.
The method must be:
- Regardless of the system or its settings on which the algorithm is executing.
- Demonstrate a direct relationship with the quantity of inputs.
- Able to discriminate between two methods with clarity and precision.
Two such methods use to analyze algorithms are `time complexity` and `space complexity`.
## What is Time Complexity?
The _number of operations an algorithm performs in proportion to the quantity of the input_ is measured by time complexity. It facilitates our investigation of how the performance of the algorithm scales with increasing input size. But in real life, **_time complexity does not refer to the time taken by the machine to execute a particular code_**.
## Order of Growth and Asymptotic Notations
The Order of Growth explains how an algorithm's space or running time expands as the amount of the input does. This increase is described via asymptotic language, such Big O notation, which concentrates on the dominating term as the input size approaches infinity and is independent of lower-order terms and machine-specific constants.
### Common Asymptotic Notation
1. `Big Oh (O)`: Provides the worst-case scenario for describing the upper bound of an algorithm's execution time.
2. `Big Omega (Ω)`: Provides the best-case scenario and describes the lower bound.
3. `Big Theta (Θ)`: Gives a tight constraint on the running time by describing both the upper and lower bounds.
### 1. Big Oh (O) Notation
Big O notation describes how an algorithm behaves as the input size gets closer to infinity and provides an upper bound on the time or space complexity of the method. It helps developers and computer scientists to evaluate the effectiveness of various algorithms without regard to the software or hardware environment.
To denote asymptotic upper bound, we use O-notation. For a given function `g(n)`, we denote by `O(g(n))` (pronounced "big-oh of g of n") the set of functions:
$$
O(g(n)) = \{ f(n) : \exists \text{ positive constants } c \text{ and } n_0 \text{ such that } 0 \leq f(n) \leq c \cdot g(n) \text{ for all } n \geq n_0 \}
$$
Graphical representation of Big Oh:
![BigOh Notation Graph](images/Time-And-Space-Complexity-BigOh.png)
### 2. Big Omega (Ω) Notation
Big Omega (Ω) notation is used to describe the lower bound of an algorithm's running time. It provides a way to express the minimum time complexity that an algorithm will take to complete. In other words, Big Omega gives us a guarantee that the algorithm will take at least a certain amount of time to run, regardless of other factors.
To denote asymptotic lower bound, we use Omega-notation. For a given function `g(n)`, we denote by `Ω(g(n))` (pronounced "big-omega of g of n") the set of functions:
$$
\Omega(g(n)) = \{ f(n) : \exists \text{ positive constants } c \text{ and } n_0 \text{ such that } 0 \leq c \cdot g(n) \leq f(n) \text{ for all } n \geq n_0 \}
$$
Graphical representation of Big Omega:
![BigOmega Notation Graph](images/Time-And-Space-Complexity-BigOmega.png)
### 3. Big Theta (Θ) Notation
Big Theta (Θ) notation provides a way to describe the asymptotic tight bound of an algorithm's running time. It offers a precise measure of the time complexity by establishing both an upper and lower bound, indicating that the running time of an algorithm grows at the same rate as a given function, up to constant factors.
To denote asymptotic tight bound, we use Theta-notation. For a given function `g(n)`, we denote by `Θ(g(n))` (pronounced "big-theta of g of n") the set of functions:
$$
\Theta(g(n)) = \{ f(n) : \exists \text{ positive constants } c_1, c_2, \text{ and } n_0 \text{ such that } 0 \leq c_1 \cdot g(n) \leq f(n) \leq c_2 \cdot g(n) \text{ for all } n \geq n_0 \}
$$
Graphical representation of Big Theta:
![Big Theta Notation Graph](images/Time-And-Space-Complexity-BigTheta.png)
## Best Case, Worst Case and Average Case
### 1. Best-Case Scenario:
The best-case scenario refers to the situation where an algorithm performs optimally, achieving the lowest possible time or space complexity. It represents the most favorable conditions under which an algorithm operates.
#### Characteristics:
- Represents the minimum time or space required by an algorithm to solve a problem.
- Occurs when the input data is structured in such a way that the algorithm can exploit its strengths fully.
- Often used to analyze the lower bound of an algorithm's performance.
#### Example:
Consider the `linear search algorithm` where we're searching for a `target element` in an array. The best-case scenario occurs when the target element is found `at the very beginning of the array`. In this case, the algorithm would only need to make one comparison, resulting in a time complexity of `O(1)`.
### 2. Worst-Case Scenario:
The worst-case scenario refers to the situation where an algorithm performs at its poorest, achieving the highest possible time or space complexity. It represents the most unfavorable conditions under which an algorithm operates.
#### Characteristics:
- Represents the maximum time or space required by an algorithm to solve a problem.
- Occurs when the input data is structured in such a way that the algorithm encounters the most challenging conditions.
- Often used to analyze the upper bound of an algorithm's performance.
#### Example:
Continuing with the `linear search algorithm`, the worst-case scenario occurs when the `target element` is either not present in the array or located `at the very end`. In this case, the algorithm would need to iterate through the entire array, resulting in a time complexity of `O(n)`, where `n` is the size of the array.
### 3. Average-Case Scenario:
The average-case scenario refers to the expected performance of an algorithm over all possible inputs, typically calculated as the arithmetic mean of the time or space complexity.
#### Characteristics:
- Represents the typical performance of an algorithm across a range of input data.
- Takes into account the distribution of inputs and their likelihood of occurrence.
- Provides a more realistic measure of an algorithm's performance compared to the best-case or worst-case scenarios.
#### Example:
For the `linear search algorithm`, the average-case scenario considers the probability distribution of the target element's position within the array. If the `target element is equally likely to be found at any position in the array`, the average-case time complexity would be `O(n/2)`, as the algorithm would, on average, need to search halfway through the array.
## Space Complexity
The memory space that a code utilizes as it is being run is often referred to as space complexity. Additionally, space complexity depends on the machine, therefore rather than using the typical memory units like MB, GB, etc., we will express space complexity using the Big O notation.
#### Examples of Space Complexity
1. `Constant Space Complexity (O(1))`: Algorithms that operate on a fixed-size array or use a constant number of variables have O(1) space complexity.
2. `Linear Space Complexity (O(n))`: Algorithms that store each element of the input array in a separate variable or data structure have O(n) space complexity.
3. `Quadratic Space Complexity (O(n^2))`: Algorithms that create a two-dimensional array or matrix with dimensions based on the input size have O(n^2) space complexity.
#### Analyzing Space Complexity
To analyze space complexity:
- Identify the variables, data structures, and recursive calls used by the algorithm.
- Determine how the space requirements scale with the input size.
- Express the space complexity using Big O notation, considering the dominant terms that contribute most to the overall space usage.
## Examples to calculate time and space complexity
#### 1. Print all elements of given array
Consider each line takes one unit of time to run. So, to simply iterate over an array to print all elements it will take `O(n)` time, where n is the size of array.
Code:
```python
arr = [1,2,3,4] #1
for x in arr: #2
print(x) #3
```
Here, the 1st statement executes only once. So, it takes one unit of time to run. The for loop consisting of 2nd and 3rd statements executes 4 times.
Also, as the code dosen't take any additional space except the input arr its Space Complexity is O(1) constant.
#### 2. Linear Search
Linear search is a simple algorithm for finding an element in an array by sequentially checking each element until a match is found or the end of the array is reached. Here's an example of calculating the time and space complexity of linear search:
```python
def linear_search(arr, target):
for x in arr: # n iterations in worst case
if x == target: # 1
return True # 1
return False # If element not found
# Example usage
arr = [1, 3, 5, 7, 9]
target = 5
print(linear_search(arr, target))
```
**Time Complexity Analysis**
The for loop iterates through the entire array, which takes O(n) time in the worst case, where n is the size of the array.
Inside the loop, each operation takes constant time (O(1)).
Therefore, the time complexity of linear search is `O(n)`.
**Space Complexity Analysis**
The space complexity of linear search is `O(1)` since it only uses a constant amount of additional space for variables regardless of the input size.
#### 3. Binary Search
Binary search is an efficient algorithm for finding an element in a sorted array by repeatedly dividing the search interval in half. Here's an example of calculating the time and space complexity of binary search:
```python
def binary_search(arr, target):
left = 0 # 1
right = len(arr) - 1 # 1
while left <= right: # log(n) iterations in worst case
mid = (left + right) // 2 # log(n)
if arr[mid] == target: # 1
return mid # 1
elif arr[mid] < target: # 1
left = mid + 1 # 1
else:
right = mid - 1 # 1
return -1 # If element not found
# Example usage
arr = [1, 3, 5, 7, 9]
target = 5
print(binary_search(arr, target))
```
**Time Complexity Analysis**
The initialization of left and right takes constant time (O(1)).
The while loop runs for log(n) iterations in the worst case, where n is the size of the array.
Inside the loop, each operation takes constant time (O(1)).
Therefore, the time complexity of binary search is `O(log n)`.
**Space Complexity Analysis**
The space complexity of binary search is `O(1)` since it only uses a constant amount of additional space for variables regardless of the input size.
#### 4. Fibbonaci Sequence
Let's consider an example of a function that generates Fibonacci numbers up to a given index and stores them in a list. In this case, the space complexity will not be constant because the size of the list grows with the Fibonacci sequence.
```python
def fibonacci_sequence(n):
fib_list = [0, 1] # Initial Fibonacci sequence with first two numbers
while len(fib_list) < n: # O(n) iterations in worst case
next_fib = fib_list[-1] + fib_list[-2] # Calculating next Fibonacci number
fib_list.append(next_fib) # Appending next Fibonacci number to list
return fib_list
# Example usage
n = 10
fib_sequence = fibonacci_sequence(n)
print(fib_sequence)
```
**Time Complexity Analysis**
The while loop iterates until the length of the Fibonacci sequence list reaches n, so it takes `O(n)` iterations in the `worst case`.Inside the loop, each operation takes constant time (O(1)).
**Space Complexity Analysis**
The space complexity of this function is not constant because it creates and stores a list of Fibonacci numbers.
As n grows, the size of the list also grows, so the space complexity is O(n), where n is the index of the last Fibonacci number generated.

Wyświetl plik

@ -0,0 +1,96 @@
# Clustering
Clustering is an unsupervised machine learning technique that groups a set of objects in such a way that objects in the same group (called a cluster) are more similar to each other than to those in other groups (clusters). This README provides an overview of clustering, including its fundamental concepts, types, algorithms, and how to implement it using Python.
## Introduction
Clustering is a technique used to find inherent groupings within data without pre-labeled targets. It is widely used in exploratory data analysis, pattern recognition, image analysis, information retrieval, and bioinformatics.
## Concepts
### Centroid
A centroid is the center of a cluster. In the k-means clustering algorithm, for example, each cluster is represented by its centroid, which is the mean of all the data points in the cluster.
### Distance Measure
Distance measures are used to quantify the similarity or dissimilarity between data points. Common distance measures include Euclidean distance, Manhattan distance, and cosine similarity.
### Inertia
Inertia is a metric used to assess the quality of the clusters formed. It is the sum of squared distances of samples to their nearest cluster center.
## Types of Clustering
1. **Hard Clustering**: Each data point either belongs to a cluster completely or not at all.
2. **Soft Clustering (Fuzzy Clustering)**: Each data point can belong to multiple clusters with varying degrees of membership.
## Clustering Algorithms
### K-Means Clustering
K-Means is a popular clustering algorithm that partitions the data into k clusters, where each data point belongs to the cluster with the nearest mean. The algorithm follows these steps:
1. Initialize k centroids randomly.
2. Assign each data point to the nearest centroid.
3. Recalculate the centroids as the mean of all data points assigned to each cluster.
4. Repeat steps 2 and 3 until convergence.
### Hierarchical Clustering
Hierarchical clustering builds a tree of clusters. There are two types:
- **Agglomerative (bottom-up)**: Starts with each data point as a separate cluster and merges the closest pairs of clusters iteratively.
- **Divisive (top-down)**: Starts with all data points in one cluster and splits the cluster iteratively into smaller clusters.
### DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
DBSCAN groups together points that are close to each other based on a distance measurement and a minimum number of points. It can find arbitrarily shaped clusters and is robust to noise.
## Implementation
### Using Scikit-learn
Scikit-learn is a popular machine learning library in Python that provides tools for clustering.
### Code Example
```python
import numpy as np
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score
# Load dataset
data = pd.read_csv('path/to/your/dataset.csv')
# Preprocess the data
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data)
# Initialize and fit KMeans model
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(data_scaled)
# Get cluster labels
labels = kmeans.labels_
# Calculate silhouette score
silhouette_avg = silhouette_score(data_scaled, labels)
print("Silhouette Score:", silhouette_avg)
# Add cluster labels to the original data
data['Cluster'] = labels
print(data.head())
```
## Evaluation Metrics
- **Silhouette Score**: Measures how similar a data point is to its own cluster compared to other clusters.
- **Inertia (Within-cluster Sum of Squares)**: Measures the compactness of the clusters.
- **Davies-Bouldin Index**: Measures the average similarity ratio of each cluster with the cluster that is most similar to it.
- **Dunn Index**: Ratio of the minimum inter-cluster distance to the maximum intra-cluster distance.
## Conclusion
Clustering is a powerful technique for discovering structure in data. Understanding different clustering algorithms and their evaluation metrics is crucial for selecting the appropriate method for a given problem.

Wyświetl plik

@ -0,0 +1,71 @@
# Grid Search
Grid Search is a hyperparameter tuning technique in Machine Learning that helps to find the best combination of hyperparameters for a given model. It works by defining a grid of hyperparameters and then training the model with all the possible combinations of hyperparameters to find the best performing set.
The Grid Search Method considers some hyperparameter combinations and selects the one returning a lower error score. This method is specifically useful when there are only some hyperparameters in order to optimize. However, it is outperformed by other weighted-random search methods when the Machine Learning model grows in complexity.
## Implementation
Before applying Grid Searching on any algorithm, data is divided into training and validation set, a validation set is used to validate the models. A model with all possible combinations of hyperparameters is tested on the validation set to choose the best combination.
Grid Searching can be applied to any hyperparameters algorithm whose performance can be improved by tuning hyperparameter. For example, we can apply grid searching on K-Nearest Neighbors by validating its performance on a set of values of K in it. Same thing we can do with Logistic Regression by using a set of values of learning rate to find the best learning rate at which Logistic Regression achieves the best accuracy.
Let us consider that the model accepts the below three parameters in the form of input:
1. Number of hidden layers `[2, 4]`
2. Number of neurons in every layer `[5, 10]`
3. Number of epochs `[10, 50]`
If we want to try out two options for every parameter input (as specified in square brackets above), it estimates different combinations. For instance, one possible combination can be `[2, 5, 10]`. Finding such combinations manually would be a headache.
Now, suppose that we had ten different parameters as input, and we would like to try out five possible values for each and every parameter. It would need manual input from the programmer's end every time we like to alter the value of a parameter, re-execute the code, and keep a record of the outputs for every combination of the parameters.
Grid Search automates that process, as it accepts the possible value for every parameter and executes the code in order to try out each and every possible combination outputs the result for the combinations and outputs the combination having the best accuracy.
Higher values of C tell the model, the training data resembles real world information, place a greater weight on the training data. While lower values of C do the opposite.
## Explaination of the Code
The code provided performs hyperparameter tuning for a Logistic Regression model using a manual grid search approach. It evaluates the model's performance for different values of the regularization strength hyperparameter C on the Iris dataset.
1. datasets from sklearn is imported to load the Iris dataset.
2. LogisticRegression from sklearn.linear_model is imported to create and fit the logistic regression model.
3. The Iris dataset is loaded, with X containing the features and y containing the target labels.
4. A LogisticRegression model is instantiated with max_iter=10000 to ensure convergence during the fitting process, as the default maximum iterations (100) might not be sufficient.
5. A list of different values for the regularization strength C is defined. The hyperparameter C controls the regularization strength, with smaller values specifying stronger regularization.
6. An empty list scores is initialized to store the model's performance scores for different values of C.
7. A for loop iterates over each value in the C list:
8. logit.set_params(C=choice) sets the C parameter of the logistic regression model to the current value in the loop.
9. logit.fit(X, y) fits the logistic regression model to the entire Iris dataset (this is typically done on training data in a real scenario, not the entire dataset).
10. logit.score(X, y) calculates the accuracy of the fitted model on the dataset and appends this score to the scores list.
11. After the loop, the scores list is printed, showing the accuracy for each value of C.
### Python Code
```python
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
iris = datasets.load_iris()
X = iris['data']
y = iris['target']
logit = LogisticRegression(max_iter = 10000)
C = [0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2]
scores = []
for choice in C:
logit.set_params(C=choice)
logit.fit(X, y)
scores.append(logit.score(X, y))
print(scores)
```
#### Results
```
[0.9666666666666667, 0.9666666666666667, 0.9733333333333334, 0.9733333333333334, 0.98, 0.98, 0.9866666666666667, 0.9866666666666667]
```
We can see that the lower values of `C` performed worse than the base parameter of `1`. However, as we increased the value of `C` to `1.75` the model experienced increased accuracy.
It seems that increasing `C` beyond this amount does not help increase model accuracy.

Wyświetl plik

@ -6,7 +6,10 @@
- [Decision Tree Learning](Decision-Tree.md)
- [Support Vector Machine Algorithm](support-vector-machine.md)
- [Artificial Neural Network from the Ground Up](ArtificialNeuralNetwork.md)
- [Introduction To Convolutional Neural Networks (CNNs)](intro-to-cnn.md)
- [TensorFlow.md](tensorFlow.md)
- [PyTorch.md](pytorch.md)
- [Types of optimizers](Types_of_optimizers.md)
- [Introduction To Convolutional Neural Networks (CNNs)](intro-to-cnn.md)
- [Logistic Regression](logistic-regression.md)
- [Clustering](clustering.md)
- [Grid Search](grid-search.md)

Wyświetl plik

@ -0,0 +1,115 @@
# Logistic Regression
Logistic Regression is a statistical method used for binary classification problems. It is a type of regression analysis where the dependent variable is categorical. This README provides an overview of logistic regression, including its fundamental concepts, assumptions, and how to implement it using Python.
## Table of Contents
1. [Introduction](#introduction)
2. [Concepts](#concepts)
3. [Assumptions](#assumptions)
4. [Implementation](#implementation)
- [Using Scikit-learn](#using-scikit-learn)
- [Code Example](#code-example)
5. [Evaluation Metrics](#evaluation-metrics)
6. [Conclusion](#conclusion)
7. [References](#references)
## Introduction
Logistic Regression is used to model the probability of a binary outcome based on one or more predictor variables (features). It is widely used in various fields such as medical research, social sciences, and machine learning for tasks such as spam detection, fraud detection, and predicting user behavior.
## Concepts
### Sigmoid Function
The logistic regression model uses the sigmoid function to map predicted values to probabilities. The sigmoid function is defined as:
$$
\sigma(z) = \frac{1}{1 + e^{-z}}
$$
Where \( z \) is a linear combination of the input features.
### Odds and Log-Odds
- **Odds**: The odds represent the ratio of the probability of an event occurring to the probability of it not occurring.
$$\text{Odds} = \frac{P(Y=1)}{P(Y=0)}$$
- **Log-Odds**: The log-odds is the natural logarithm of the odds.
$$\text{Log-Odds} = \log \left( \frac{P(Y=1)}{P(Y=0)} \right)$$
Logistic regression models the log-odds as a linear combination of the input features.
### Model Equation
The logistic regression model equation is:
$$
\log \left( \frac{P(Y=1)}{P(Y=0)} \right) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_n X_n
$$
Where:
- &beta;₀ is the intercept.
- &beta;<sub>i</sub> are the coefficients for the predictor variables X<sub>i</sub>.
## Assumptions
1. **Linearity**: The log-odds of the response variable are a linear combination of the predictor variables.
2. **Independence**: Observations should be independent of each other.
3. **No Multicollinearity**: Predictor variables should not be highly correlated with each other.
4. **Large Sample Size**: Logistic regression requires a large sample size to provide reliable results.
## Implementation
### Using Scikit-learn
Scikit-learn is a popular machine learning library in Python that provides tools for logistic regression.
### Code Example
```python
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
# Load dataset
data = pd.read_csv('path/to/your/dataset.csv')
# Define features and target variable
X = data[['feature1', 'feature2', 'feature3']]
y = data['target']
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Initialize and train logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)
print("Accuracy:", accuracy)
print("Confusion Matrix:\n", conf_matrix)
print("Classification Report:\n", class_report)
```
## Evaluation Metrics
- **Accuracy**: The proportion of correctly classified instances among all instances.
- **Confusion Matrix**: A table showing the number of true positives, true negatives, false positives, and false negatives.
- **Precision, Recall, and F1-Score**: Metrics to evaluate the performance of the classification model.
## Conclusion
Logistic regression is a fundamental classification technique that is easy to implement and interpret. It is a powerful tool for binary classification problems and provides a probabilistic framework for predicting binary outcomes.

Wyświetl plik

@ -0,0 +1,120 @@
# NumPy Array Iteration
Iterating over arrays in NumPy is a common task when processing data. NumPy provides several ways to iterate over elements of an array efficiently.
Understanding these methods is crucial for performing operations on array elements effectively.
## 1. Basic Iteration
- Iterating using basic `for` loop.
### Single-dimensional array
Iterating over a single-dimensional array is straightforward using a basic `for` loop
```python
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
for i in arr:
print(i)
```
#### Output
```python
1
2
3
4
5
```
### Multi-dimensional array
Iterating over multi-dimensional arrays, each iteration returns a sub-array along the first axis.
```python
marr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
for arr in marr:
print(arr)
```
#### Output
```python
[1 2 3]
[4 5 6]
[7 8 9]
```
## 2. Iterating with `nditer`
- `nditer` is a powerful iterator provided by NumPy for iterating over multi-dimensional arrays.
- In each interation it gives each element.
```python
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
for i in np.nditer(arr):
print(i)
```
#### Output
```python
1
2
3
4
5
6
```
## 3. Iterating with `ndenumerate`
- `ndenumerate` allows you to iterate with both the index and the value of each element.
- It gives index and value as output in each iteration
```python
import numpy as np
arr = np.array([[1, 2], [3, 4]])
for index,value in np.ndenumerate(arr):
print(index,value)
```
#### Output
```python
(0, 0) 1
(0, 1) 2
(1, 0) 3
(1, 1) 4
```
## 4. Iterating with flat
- The `flat` attribute returns a 1-D iterator over the array.
```python
import numpy as np
arr = np.array([[1, 2], [3, 4]])
for element in arr.flat:
print(element)
```
#### Output
```python
1
2
3
4
```
Understanding the various ways to iterate over NumPy arrays can significantly enhance your data processing efficiency.
Whether you are working with single-dimensional or multi-dimensional arrays, NumPy provides versatile tools to iterate and manipulate array elements effectively.

Wyświetl plik

@ -0,0 +1,223 @@
# Concatenation of Arrays
Concatenation of arrays in NumPy refers to combining multiple arrays into a single array, either along existing axes or by adding new axes. NumPy provides several functions for this purpose.
# Functions of Concatenation
## np.concatenate
Joins two or more arrays along an existing axis.
### Syntax
```python
numpy.concatenate((arr1, arr2, ...), axis)
```
Args:
- arr1, arr2, ...: Sequence of arrays to concatenate.
- axis: Axis along which the arrays will be joined. Default is 0.
### Example
#### Concatenate along axis 0
```python
import numpy as np
#creating 2 arrays
arr1 = np.array([1 2 3],[7 8 9])
arr2 = np.array([4 5 6],[10 11 12])
result_1 = np.concatenate((arr1, arr2), axis=0)
print(result_1)
```
#### Output
```
[[ 1 2 3]
[ 7 8 9]
[ 4 5 6]
[10 11 12]]
```
#### Concatenate along axis 1
```python
result_2 = np.concatenate((arr1, arr2), axis=1)
print(result_2)
```
#### Output
```
[[ 1 2 3 4 5 6 ]
[ 7 8 9 10 11 12]]
```
## np.vstack
Vertical stacking of arrays (row-wise).
### Syntax
```python
numpy.vstack(arrays)
```
Args:
- arrays: Sequence of arrays to stack.
### Example
```python
import numpy as np
#create arrays
arr1= np.array([1 2 3], [7 8 9])
arr2 = np.array([4 5 6],[10 11 12])
result = np.vstack((arr1, arr2))
print(result)
```
#### Output
```
[[ 1 2 3]
[ 7 8 9]
[ 4 5 6]
[10 11 12]]
```
## 3. np.hstack
Stacks arrays horizontally (column-wise).
### Syntax
```python
numpy.hstack(arrays)
```
Args:
- arrays: Sequence of arrays to stack.
### Example
```python
import numpy as np
#create arrays
arr1= np.array([1 2 3], [7 8 9])
arr2 = np.array([4 5 6],[10 11 12])
result = np.hstack((arr1, arr2))
print(result)
```
#### Output
```
[[ 1 2 3] [ 4 5 6]
[ 7 8 9] [10 11 12]]
```
## np.dstack
Stacks arrays along the third axis (depth-wise).
### Syntax
```python
numpy.dstack(arrays)
```
- arrays: Sequence of arrays to stack.
### Example
```python
import numpy as np
#create arrays
arr1= np.array([1 2 3], [7 8 9])
arr2 = np.array([4 5 6],[10 11 12])
result = np.dstack((arr1, arr2))
print(result)
```
#### Output
```
[[[ 1 4]
[ 2 5]
[ 3 6]]
[[ 7 10]
[ 8 11]
[ 9 12]]]
```
## np.stack
Joins a sequence of arrays along a new axis.
```python
numpy.stack(arrays, axis)
```
Args:
- arrays: Sequence of arrays to stack.
### Example
```python
import numpy as np
#create arrays
arr1= np.array([1 2 3], [7 8 9])
arr2 = np.array([4 5 6],[10 11 12])
result = np.stack((arr1, arr2), axis=0)
print(result)
```
#### Output
```
[[[ 1 2 3]
[ 7 8 9]]
[[ 4 5 6]
[10 11 12]]]
```
# Concatenation with Mixed Dimensions
When concatenating arrays with different shapes, it's often necessary to reshape them to have compatible dimensions.
## Example
#### Concatenate along axis 0
```python
arr1 = np.array([[1, 2, 3], [4, 5, 6]])
arr2 = np.array([7, 8, 9])
result_0= np.concatenate((arr1, arr2[np.newaxis, :]), axis=0)
print(result_0)
```
#### Output
```
[[1 2 3]
[4 5 6]
[7 8 9]]
```
#### Concatenate along axis 1
```python
result_1 = np.concatenate((arr1, arr2[:, np.newaxis]), axis=1)
print(result_1)
```
#### Output
```
[[1 2 3 7]
[4 5 6 8]]
```

Wyświetl plik

@ -3,8 +3,11 @@
- [Installing NumPy](installing-numpy.md)
- [Introduction](introduction.md)
- [NumPy Data Types](datatypes.md)
- [Numpy Array Shape and Reshape](reshape-array.md)
- [Basic Mathematics](basic_math.md)
- [Operations on Arrays in NumPy](operations-on-arrays.md)
- [Loading Arrays from Files](loading_arrays_from_files.md)
- [Saving Numpy Arrays into FIles](saving_numpy_arrays_to_files.md)
- [Sorting NumPy Arrays](sorting-array.md)
- [NumPy Array Iteration](array-iteration.md)
- [Concatenation of Arrays](concatenation-of-arrays.md)

Wyświetl plik

@ -0,0 +1,57 @@
# Numpy Array Shape and Reshape
In NumPy, the primary data structure is the ndarray (N-dimensional array). An array can have one or more dimensions, and it organizes your data efficiently.
Let us create a 2D array
``` python
import numpy as np
numbers = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(numbers)
```
#### Output:
``` python
array([[1, 2, 3, 4],[5, 6, 7, 8]])
```
## Changing Array Shape using `reshape()`
The `reshape()` function allows you to rearrange the data within a NumPy array.
It take 2 arguments, row and columns. The `reshape()` can add or remove the dimensions. For instance, array can convert a 1D array into a 2D array or vice versa.
``` python
arr_1d = np.array([1, 2, 3, 4, 5, 6]) # 1D array
arr_2d = arr_1d.reshape(2, 3) # Reshaping with 2 rows and 3 cols
print(arr_2d)
```
#### Output:
``` python
array([[1, 2, 3],[4, 5, 6]])
```
## Changing Array Shape using `resize()`
The `resize()` function allows you to modify the shape of a NumPy array directly.
It take 2 arguements, row and columns.
``` python
import numpy as np
arr_1d = np.array([1, 2, 3, 4, 5, 6])
arr_1d.resize((2, 3)) # 2 rows and 3 cols
print(arr_1d)
```
#### Output:
``` python
array([[1, 2, 3],[4, 5, 6]])
```

Wyświetl plik

@ -0,0 +1,158 @@
# Working with Date & Time in Pandas
While working with data, it is common to come across data containing date and time. Pandas is a very handy tool for dealing with such data and provides a wide range of date and time data processing options.
- **Parsing dates and times**: Pandas provides a number of functions for parsing dates and times from strings, including `to_datetime()` and `parse_dates()`. These functions can handle a variety of date and time formats, Unix timestamps, and human-readable formats.
- **Manipulating dates and times**: Pandas provides a number of functions for manipulating dates and times, including `shift()`, `resample()`, and `to_timedelta()`. These functions can be used to add or subtract time periods, change the frequency of a time series, and calculate the difference between two dates or times.
- **Visualizing dates and times**: Pandas provides a number of functions for visualizing dates and times, including `plot()`, `hist()`, and `bar()`. These functions can be used to create line charts, histograms, and bar charts of date and time data.
### `Timestamp` function
The timestamp function in Pandas is used to convert a datetime object to a Unix timestamp. A Unix timestamp is a numerical representation of datetime.
Example for retrieving day, month and year from given date:
```python
import pandas as pd
ts = pd.Timestamp('2024-05-05')
y = ts.year
print('Year is: ', y)
m = ts.month
print('Month is: ', m)
d = ts.day
print('Day is: ', d)
```
Output:
```python
Year is: 2024
Month is: 5
Day is: 5
```
Example for extracting time related data from given date:
```python
import pandas as pd
ts = pd.Timestamp('2024-10-24 12:00:00')
print('Hour is: ', ts.hour)
print('Minute is: ', ts.minute)
print('Weekday is: ', ts.weekday())
print('Quarter is: ', ts.quarter)
```
Output:
```python
Hour is: 12
Minute is: 0
Weekday is: 1
Quarter is: 4
```
### `Timestamp.now()`
Example for getting current date and time:
```python
import pandas as pd
ts = pd.Timestamp.now()
print('Current date and time is: ', ts)
```
Output:
```python
Current date and time is: 2024-05-25 11:48:25.593213
```
### `date_range` function
Example for generating dates' for next five days:
```python
import pandas as pd
ts = pd.date_range(start = pd.Timestamp.now(), periods = 5)
for i in ts:
print(i.date())
```
Output:
```python
2024-05-25
2024-05-26
2024-05-27
2024-05-28
2024-05-29
```
Example for generating dates' for previous five days:
```python
import pandas as pd
ts = pd.date_range(end = pd.Timestamp.now(), periods = 5)
for i in ts:
print(i.date())
```
Output:
```python
2024-05-21
2024-05-22
2024-05-23
2024-05-24
2024-05-25
```
### Built-in vs pandas date & time operations
In `pandas`, you may add a time delta to a full column of dates in a single action, but Python's datetime requires a loop.
Example in Pandas:
```python
import pandas as pd
dates = pd.DataFrame(pd.date_range('2023-01-01', periods=100000, freq='T'))
dates += pd.Timedelta(days=1)
print(dates)
```
Output:
```python
0
0 2023-01-02 00:00:00
1 2023-01-02 00:01:00
2 2023-01-02 00:02:00
3 2023-01-02 00:03:00
4 2023-01-02 00:04:00
... ...
99995 2023-03-12 10:35:00
99996 2023-03-12 10:36:00
99997 2023-03-12 10:37:00
99998 2023-03-12 10:38:00
99999 2023-03-12 10:39:00
```
Example using Built-in datetime library:
```python
from datetime import datetime, timedelta
dates = [datetime(2023, 1, 1) + timedelta(minutes=i) for i in range(100000)]
dates = [date + timedelta(days=1) for date in dates]
```
Why use pandas functions?
- Pandas employs NumPy's datetime64 dtype, which takes up a set amount of bytes (usually 8 bytes per date), to store datetime data more compactly and efficiently.
- Each datetime object in Python takes up extra memory since it contains not only the date and time but also the additional metadata and overhead associated with Python objects.
- Pandas Offers a wide range of convenient functions and methods for date manipulation, extraction, and conversion, such as `pd.to_datetime()`, `date_range()`, `timedelta_range()`, and more. datetime library requires manual implementation for many of these operations, leading to longer and less efficient code.

Wyświetl plik

@ -5,5 +5,6 @@
- [Pandas Descriptive Statistics](Descriptive_Statistics.md)
- [Group By Functions with Pandas](GroupBy_Functions_Pandas.md)
- [Excel using Pandas DataFrame](excel_with_pandas.md)
- [Working with Date & Time in Pandas](datetime.md)
- [Importing and Exporting Data in Pandas](import-export.md)
- [Handling Missing Values in Pandas](handling-missing-values.md)

Wyświetl plik

@ -0,0 +1,216 @@
# Bar Plots in Matplotlib
A bar plot or a bar chart is a type of data visualisation that represents data in the form of rectangular bars, with lengths or heights proportional to the values and data which they represent. The bar plots can be plotted both vertically and horizontally.
It is one of the most widely used type of data visualisation as it is easy to interpret and is pleasing to the eyes.
Matplotlib provides a very easy and intuitive method to create highly customized bar plots.
## Prerequisites
Before creating bar plots in matplotlib you must ensure that you have Python as well as Matplotlib installed on your system.
## Creating a simple Bar Plot with `bar()` method
A very basic Bar Plot can be created with `bar()` method in `matplotlib.pyplot`
```Python
import matplotlib.pyplot as plt
# Creating dataset
x = ["A", "B", "C", "D"]
y = [2, 7, 9, 11]
# Creating bar plot
plt.bar(x,y)
plt.show() # Shows the plot
```
When executed, this would show the following bar plot:
![Basic Bar Plot](images/basic_bar_plot.png)
The `bar()` function takes arguments that describes the layout of the bars.
Here, `plt.bar(x,y)` is used to specify that the bar chart is to be plotted by taking the `x` array as X-axis and `y` array as Y-axis. You can customize the graph further like adding labels to the axes, color of the bars, etc. These will be explored in the upcoming sections.
Additionally, you can also use `numpy` arrays for faster generation when handling large datasets.
```Python
import matplotlib.pyplot as plt
import numpy as np
# Using numpy array
x = np.array(["A", "B", "C", "D"])
y = np.array([2, 7, 9, 11])
plt.bar(x,y)
plt.show()
```
Its output would be the same as above.
## Customizing Bar Plots
For creating customized bar plots, it is **highly recommended** to create the plots using `matplotlib.pyplot.subplots()`, otherwise it is difficult to apply the customizations in the newer versions of Matplotlib.
### Adding title to the graph and labeling the axes
Let us create an imaginary graph of number of cars sold in a various years.
```Python
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
years = ['1999', '2000', '2001', '2002']
num_of_cars_sold = [300, 500, 700, 1000]
# Creating bar plot
ax.bar(years, num_of_cars_sold)
# Adding axis labels
ax.set_xlabel("Years")
ax.set_ylabel("Number of cars sold")
# Adding plot title
ax.set_title("Number of cars sold in various years")
plt.show()
```
![Title and axis labels](images/title_and_axis_labels.png)
Here, we have created a `matplotlib.pyplot.subplots()` object which returns a `Figure` object `fig` as well as an `Axes` object `ax` both of which are used for customizing the bar plot. `ax.set_xlabel`, `ax.set_ylabel` and `ax.set_title` are respectively used for adding labels of X, Y axis and adding title to the graph.
### Adding bar colors and legends
Let us consider our previous example of number of cars sold in various years and suppose that we want to add different colors to the bars from different centuries and respective legends for better interpretation.
This can be achieved by creating two separate arrays `bar_colors` for bar colors and `bar_labels` for legend labels and passing them as arguments to parameters color and label respectively in `ax.bar` method.
```Python
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
years = ['1998', '1999', '2000', '2001', '2002']
num_of_cars_sold = [200, 300, 500, 700, 1000]
bar_colors = ['tab:green', 'tab:green', 'tab:blue', 'tab:blue', 'tab:blue']
bar_labels = ['1900s', '_1900s', '2000s', '_2000s', '_2000s']
# Creating the customized bar plot
ax.bar(years, num_of_cars_sold, color=bar_colors, label=bar_labels)
# Adding axis labels
ax.set_xlabel("Years")
ax.set_ylabel("Number of cars sold")
# Adding plot title
ax.set_title("Number of cars sold in various years")
# Adding legend title
ax.legend(title='Centuries')
plt.show()
```
![Bar colors and Legends](images/bar_colors_and_legends.png)
Note that the labels with a preceding underscore won't show up in the legend. Legend titles can be added by simply passing `title` argument in `ax.legend()`, as shown. Also, you can have a different color for all the bars by passing the `HEX` value of that color in the `color` parameter.
### Adding labels to bars
We may want to add labels to bars representing their absolute (or truncated) values for instant and accurate reading. This can be achieved by passing the `BarContainer` object (returned by `ax.bar()` method) which is basically a aontainer with all the bars and optionally errorbars to `ax.bar_label` method.
```Python
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
years = ['1998', '1999', '2000', '2001', '2002']
num_of_cars_sold = [200, 300, 500, 700, 1000]
bar_colors = ['tab:green', 'tab:green', 'tab:blue', 'tab:blue', 'tab:blue']
bar_labels = ['1900s', '_1900s', '2000s', '_2000s', '_2000s']
# BarContainer object
bar_container = ax.bar(years, num_of_cars_sold, color=bar_colors, label=bar_labels)
ax.set_xlabel("Years")
ax.set_ylabel("Number of cars sold")
ax.set_title("Number of cars sold in various years")
ax.legend(title='Centuries')
# Adding bar labels
ax.bar_label(bar_container)
plt.show()
```
![Bar Labels](images/bar_labels.png)
**Note:** There are various other methods of adding bar labels in matplotlib.
## Horizontal Bar Plot
We can create horizontal bar plots by using the `barh()` method in `matplotlib.pyplot`. All the relevant customizations are applicable here also.
```Python
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(10,5)) # figsize is used to alter the size of figure
years = ['1998', '1999', '2000', '2001', '2002']
num_of_cars_sold = [200, 300, 500, 700, 1000]
bar_colors = ['tab:green', 'tab:green', 'tab:blue', 'tab:blue', 'tab:blue']
bar_labels = ['1900s', '_1900s', '2000s', '_2000s', '_2000s']
# Creating horizontal bar plot
bar_container = ax.barh(years, num_of_cars_sold, color=bar_colors, label=bar_labels)
# Adding axis labels
ax.set_xlabel("Years")
ax.set_ylabel("Number of cars sold")
# Adding Title
ax.set_title("Number of cars sold in various years")
ax.legend(title='Centuries')
# Adding bar labels
ax.bar_label(bar_container)
plt.show()
```
![Horizontal Bar Plot-1](images/horizontal_bar_plot_1.png)
We can also invert the Y-axis labels here to show the top values first.
```Python
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(10,5)) # figsize is used to alter the size of figure
years = ['1998', '1999', '2000', '2001', '2002']
num_of_cars_sold = [200, 300, 500, 700, 1000]
bar_colors = ['tab:green', 'tab:green', 'tab:blue', 'tab:blue', 'tab:blue']
bar_labels = ['1900s', '_1900s', '2000s', '_2000s', '_2000s']
# Creating horizontal bar plot
bar_container = ax.barh(years, num_of_cars_sold, color=bar_colors, label=bar_labels)
# Adding axis labels
ax.set_xlabel("Years")
ax.set_ylabel("Number of cars sold")
# Adding Title
ax.set_title("Number of cars sold in various years")
ax.legend(title='Centuries')
# Adding bar labels
ax.bar_label(bar_container)
# Inverting Y-axis
ax.invert_yaxis()
plt.show()
```
![Horizontal Bar Plot-2](images/horizontal_bar_plot_2.png)

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 22 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 22 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 24 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 12 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 14 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 13 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 22 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 44 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 25 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 25 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 25 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 17 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 22 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 27 KiB

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 18 KiB

Wyświetl plik

@ -1,3 +1,5 @@
# List of sections
- [Installing Matplotlib](matplotlib_installation.md)
- [Installing Matplotlib](matplotlib-installation.md)
- [Bar Plots in Matplotlib](matplotlib-bar-plots.md)
- [Pie Charts in Matplotlib](matplotlib-pie-charts.md)

Wyświetl plik

@ -0,0 +1,233 @@
# Pie Charts in Matplotlib
A pie chart is a type of graph that represents the data in the circular graph. The slices of pie show the relative size of the data, and it is a type of pictorial representation of data. A pie chart requires a list of categorical variables and numerical variables. Here, the term "pie" represents the whole, and the "slices" represent the parts of the whole.
Pie charts are commonly used in business presentations like sales, operations, survey results, resources, etc. as they are pleasing to the eye and provide a quick summary.
## Prerequisites
Before creating pie charts in matplotlib you must ensure that you have Python as well as Matplotlib installed on your system.
## Creating a simple pie chart with `pie()` method
A basic pie chart can be created with `pie()` method in `matplotlib.pyplot`.
```Python
import matplotlib.pyplot as plt
# Creating dataset
labels = ['A','B','C','D','E']
data = [10,20,30,40,50]
# Creating Plot
plt.pie(data, labels=labels)
# Show plot
plt.show()
```
When executed, this would show the following pie chart:
![Basic Pie Chart](images/basic_pie_chart.png)
Note that the slices of the pie are labelled according to their corresponding proportion in the `data` as a whole.
The `pie()` function takes arguments that describes the layout of the pie chart.
Here, `plt.pie(data, labels=labels)` is used to specify that the pie chart is to be plotted by taking the values from array `data` and the fractional area of each slice is represented by **data/sum(data)**. The array `labels` represents the labels of slices corresponding to each value in `data`.
You can customize the graph further like specifying custom colors for slices, exploding slices, labeling wedges (slices), etc. These will be explored in the upcoming sections.
## Customizing Pie Chart in Matplotlib
For creating customized plots, it is highly recommended to create the plots using `matplotlib.pyplot.subplots()`, otherwise it is difficult to apply the customizations in the newer versions of Matplotlib.
### Coloring Slices
You can add custom set of colors to the slices by passing an array of colors to `colors` parameter in `pie()` method.
```Python
import matplotlib.pyplot as plt
# Creating dataset
labels = ['A','B','C','D','E']
data = [10,20,30,40,50]
colors = ['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:pink']
# Creating plot using matplotlib.pyplot.subplots()
fig, ax = plt.subplots()
ax.pie(data, labels=labels, colors=colors)
# Show plot
plt.show()
```
![Coloring Slices](images/coloring_slices.png)
Here, we have created a `matplotlib.pyplot.subplots()` object which returns a `Figure` object `fig` as well as an `Axes` object `ax` both of which are used for customizing the pie chart.
**Note:** Each slice of the pie chart is a `patches.Wedge` object; therefore in addition to the customizations shown here, each wedge can be customized using the `wedgeprops` argument which takes Python dictionary as parameter with name values pairs denoting the wedge properties like linewidth, edgecolor, etc.
### Hatching Slices
To make the pie chart more pleasing, you can pass a list of hatch patters to `hatch` parameter to set the pattern of each slice.
```Python
import matplotlib.pyplot as plt
# Creating dataset
labels = ['A','B','C','D','E']
data = [10,20,30,40,50]
colors = ['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:pink']
hatch = ['*O', 'oO', 'OO', '.||.', '|*|'] # Hatch patterns
# Creating plot
fig, ax = plt.subplots()
ax.pie(data, labels=labels, colors=colors, hatch=hatch)
# Show plot
plt.show()
```
![Hatch Patterns](images/hatch_patterns.png)
You can try and test your own beautiful hatch patters!
### Labeling Slices
You can pass a function or format string to `autopct` parameter to label slices.
An example in shown here:
```Python
import matplotlib.pyplot as plt
# Creating dataset
labels = ['Rose','Tulip','Marigold','Sunflower','Daffodil']
data = [11,9,17,4,7]
colors=['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:pink']
# Creating plot
fig, ax = plt.subplots()
ax.pie(data, labels=labels, colors=colors, autopct='%1.1f%%')
# Show plot
plt.show()
```
![Autopct Example](images/autopct.png)
Here, `autopct='%1.1f%%'` specifies that the wedges (slices) have to be labelled corresponding to the percentage proportion which they occupy out of 100% with precision upto 1 decimal places.
### Exploding Slices
The explode parameter separates a portion of the chart. You can explode slices by passing an array of numbers to `explode` parameter.
```Python
import matplotlib.pyplot as plt
# Creating dataset
labels = ['Rose','Tulip','Marigold','Sunflower','Daffodil']
data = [11,9,17,4,7]
colors=['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:pink']
# Explode only the first slice, i.e 'Rose'
explode = [0.1, 0, 0, 0, 0]
# Creating plot
fig, ax = plt.subplots()
ax.pie(data, labels=labels, colors=colors, explode=explode, autopct='%1.1f%%')
# Show plot
plt.show()
```
![Explode Slice](images/explode_slice.png)
### Shading Slices
You can add shadow to slices by passing `shadow=True` in `pie()` method.
```Python
import matplotlib.pyplot as plt
# Creating dataset
labels = ['Rose','Tulip','Marigold','Sunflower','Daffodil']
data = [11,9,17,4,7]
colors=['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:pink']
# Explode only the first slice, i.e 'Rose'
explode = [0.1, 0, 0, 0, 0]
# Creating plot
fig, ax = plt.subplots()
ax.pie(data, labels=labels, colors=colors, explode=explode, shadow=True, autopct='%1.1f%%')
# Show plot
plt.show()
```
![Shadow](images/shadow.png)
### Rotating Slices
You can rotate slices by passing a custom start angle value to the `startangle` parameter.
```Python
import matplotlib.pyplot as plt
# Creating dataset
labels = ['Rose','Tulip','Marigold','Sunflower','Daffodil']
data = [11,9,17,4,7]
colors=['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:pink']
# Creating plot
fig, ax = plt.subplots()
ax.pie(data, labels=labels, colors=colors, startangle=90, autopct='%1.1f%%')
# Show plot
plt.show()
```
![Rotating Slices](images/rotating_slices.png)
The default `startangle` is 0, which would start the first slice ('Rose') on the positive x-axis. This example sets `startangle=90` such that all the slices are rotated counter-clockwise by 90 degrees, and the `'Rose'` slice starts on the positive y-axis.
### Controlling Size of Pie Chart
In addition to the size of figure, you can also control the size of pie chart using the `radius` parameter.
```Python
import matplotlib.pyplot as plt
# Creating dataset
labels = ['Rose','Tulip','Marigold','Sunflower','Daffodil']
data = [11,9,17,4,7]
colors=['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:pink']
# Creating plot
fig, ax = plt.subplots()
ax.pie(data, labels=labels, colors=colors, startangle=90, autopct='%1.1f%%', textprops={'size': 'smaller'}, radius=0.7)
# Show plot
plt.show()
```
![Controlling Size](images/radius.png)
Note that `textprops` is an additional argument which can be used for controlling the propoerties of any text in the pie chart. In this case, we have specified that the size of text should be smaller. There are many more such properties available in `textprops`.
### Adding Legends
You can also use legends to act like a label to slices, like this:
```Python
import matplotlib.pyplot as plt
# Creating dataset
labels = ['Rose','Tulip','Marigold','Sunflower','Daffodil']
data = [11,9,17,4,7]
colors=['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:pink']
# Creating plot
fig, ax = plt.subplots(figsize=(7,7))
ax.pie(data, colors=colors, startangle=90, autopct='%1.1f%%', radius=0.7)
plt.legend(labels, title="Flowers")
# Show plot
plt.show()
```
![Legends](images/legends.png)