Merge branch 'main' into main
|
@ -24,8 +24,8 @@ The list of topics for which we are looking for content are provided below along
|
|||
- Web Scrapping - [Link](https://github.com/animator/learn-python/tree/main/contrib/web-scrapping)
|
||||
- API Development - [Link](https://github.com/animator/learn-python/tree/main/contrib/api-development)
|
||||
- Data Structures & Algorithms - [Link](https://github.com/animator/learn-python/tree/main/contrib/ds-algorithms)
|
||||
- Python Mini Projects - [Link](https://github.com/animator/learn-python/tree/main/contrib/mini-projects)
|
||||
- Python Question Bank - [Link](https://github.com/animator/learn-python/tree/main/contrib/question-bank)
|
||||
- Python Mini Projects - [Link](https://github.com/animator/learn-python/tree/main/contrib/mini-projects) **(Not accepting)**
|
||||
- Python Question Bank - [Link](https://github.com/animator/learn-python/tree/main/contrib/question-bank) **(Not accepting)**
|
||||
|
||||
You can check out some content ideas below.
|
||||
|
||||
|
|
|
@ -0,0 +1,192 @@
|
|||
# Exception Handling in Python
|
||||
|
||||
Exception Handling is a way of managing the errors that may occur during a program execution. Python's exception handling mechanism has been designed to avoid the unexpected termination of the program, and offer to either regain control after an error or display a meaningful message to the user.
|
||||
|
||||
- **Error** - An error is a mistake or an incorrect result produced by a program. It can be a syntax error, a logical error, or a runtime error. Errors are typically fatal, meaning they prevent the program from continuing to execute.
|
||||
- **Exception** - An exception is an event that occurs during the execution of a program that disrupts the normal flow of instructions. Exceptions are typically unexpected and can be handled by the program to prevent it from crashing or terminating abnormally. It can be runtime, input/output or system exceptions. Exceptions are designed to be handled by the program, allowing it to recover from the error and continue executing.
|
||||
|
||||
## Python Built-in Exceptions
|
||||
|
||||
There are plenty of built-in exceptions in Python that are raised when a corresponding error occur.
|
||||
We can view all the built-in exceptions using the built-in `local()` function as follows:
|
||||
|
||||
```python
|
||||
print(dir(locals()['__builtins__']))
|
||||
```
|
||||
|
||||
|**S.No**|**Exception**|**Description**|
|
||||
|---|---|---|
|
||||
|1|SyntaxError|A syntax error occurs when the code we write violates the grammatical rules such as misspelled keywords, missing colon, mismatched parentheses etc.|
|
||||
|2|TypeError|A type error occurs when we try to perform an operation or use a function with objects that are of incompatible data types.|
|
||||
|3|NameError|A name error occurs when we try to use a variable, function, module or string without quotes that hasn't been defined or isn't used in a valid way.|
|
||||
|4|IndexError|A index error occurs when we try to access an element in a sequence (like a list, tuple or string) using an index that's outside the valid range of indices for that sequence.|
|
||||
|5|KeyError|A key error occurs when we try to access a key that doesn't exist in a dictionary. Attempting to retrieve a value using a non-existent key results this error.|
|
||||
|6|ValueError|A value error occurs when we provide an argument or value that's inappropriate for a specific operation or function such as doing mathematical operations with incompatible types (e.g., dividing a string by an integer.)|
|
||||
|7|AttributeError|An attribute error occurs when we try to access an attribute (like a variable or method) on an object that doesn't possess that attribute.|
|
||||
|8|IOError|An IO (Input/Output) error occurs when an operation involving file or device interaction fails. It signifies that there's an issue during communication between your program and the external system.|
|
||||
|9|ZeroDivisionError|A ZeroDivisionError occurs when we attempt to divide a number by zero. This operation is mathematically undefined, and Python raises this error to prevent nonsensical results.|
|
||||
|10|ImportError|An import error occurs when we try to use a module or library that Python can't find or import succesfully.|
|
||||
|
||||
## Try and Except Statement - Catching Exception
|
||||
|
||||
The `try-except` statement allows us to anticipate potential errors during program execution and define what actions to take when those errors occur. This prevents the program from crashing unexpectedly and makes it more robust.
|
||||
|
||||
Here's an example to explain this:
|
||||
|
||||
```python
|
||||
try:
|
||||
# Code that might raise an exception
|
||||
result = 10 / 0
|
||||
except:
|
||||
print("An error occured!")
|
||||
```
|
||||
|
||||
Output
|
||||
|
||||
```markdown
|
||||
An error occured!
|
||||
```
|
||||
|
||||
In this example, the `try` block contains the code that you suspect might raise an exception. Python attempts to execute the code within this block. If an exception occurs, Python jumps to the `except` block and executes the code within it.
|
||||
|
||||
## Specific Exception Handling
|
||||
|
||||
You can specify the type of expection you want to catch using the `except` keyword followed by the exception class name. You can also have multiple `except` blocks to handle different exception types.
|
||||
|
||||
Here's an example:
|
||||
|
||||
```python
|
||||
try:
|
||||
# Code that might raise ZeroDivisionError or NameError
|
||||
result = 10 / 0
|
||||
name = undefined_variable
|
||||
except ZeroDivisionError:
|
||||
print("Oops! You tried to divide by zero.")
|
||||
except NameError:
|
||||
print("There's a variable named 'undefined_variable' that hasn't been defined yet.")
|
||||
```
|
||||
|
||||
Output
|
||||
|
||||
```markdown
|
||||
Oops! You tried to divide by zero.
|
||||
```
|
||||
|
||||
If you comment on the line `result = 10 / 0`, then the output will be:
|
||||
|
||||
```markdown
|
||||
There's a variable named 'undefined_variable' that hasn't been defined yet.
|
||||
```
|
||||
|
||||
## Important Note
|
||||
|
||||
In this code, the `except` block are specific to each type of expection. If you want to catch both exceptions with a single `except` block, you can use of tuple of exceptions, like this:
|
||||
|
||||
```python
|
||||
try:
|
||||
# Code that might raise ZeroDivisionError or NameError
|
||||
result = 10 / 0
|
||||
name = undefined_variable
|
||||
except (ZeroDivisionError, NameError):
|
||||
print("An error occured!")
|
||||
```
|
||||
|
||||
Output
|
||||
|
||||
```markdown
|
||||
An error occured!
|
||||
```
|
||||
|
||||
## Try with Else Clause
|
||||
|
||||
The `else` clause in a Python `try-except` block provides a way to execute code only when the `try` block succeeds without raising any exceptions. It's like having a section of code that runs exclusively under the condition that no errors occur during the main operation in the `try` block.
|
||||
|
||||
Here's an example to understand this:
|
||||
|
||||
```python
|
||||
def calculate_average(numbers):
|
||||
if len(numbers) == 0: # Handle empty list case seperately (optional)
|
||||
return None
|
||||
try:
|
||||
total = sum(numbers)
|
||||
average = total / len(numbers)
|
||||
except ZeroDivisionError:
|
||||
print("Cannot calculate average for a list containing zero.")
|
||||
else:
|
||||
print("The average is:", average)
|
||||
return average #Optionally return the average here
|
||||
|
||||
# Example usage
|
||||
numbers = [10, 20, 30]
|
||||
result = calculate_average(numbers)
|
||||
|
||||
if result is not None: # Check if result is available (handles empty list case)
|
||||
print("Calculation succesfull!")
|
||||
```
|
||||
|
||||
Output
|
||||
|
||||
```markdown
|
||||
The average is: 20.0
|
||||
```
|
||||
|
||||
## Finally Keyword in Python
|
||||
|
||||
The `finally` keyword in Python is used within `try-except` statements to execute a block of code **always**, regardless of whether an exception occurs in the `try` block or not.
|
||||
|
||||
To understand this, let us take an example:
|
||||
|
||||
```python
|
||||
try:
|
||||
a = 10 // 0
|
||||
print(a)
|
||||
except ZeroDivisionError:
|
||||
print("Cannot be divided by zero.")
|
||||
finally:
|
||||
print("Program executed!")
|
||||
```
|
||||
|
||||
Output
|
||||
|
||||
```markdown
|
||||
Cannot be divided by zero.
|
||||
Program executed!
|
||||
```
|
||||
|
||||
## Raise Keyword in Python
|
||||
|
||||
In Python, raising an exception allows you to signal that an error condition has occured during your program's execution. The `raise` keyword is used to explicity raise an exception.
|
||||
|
||||
Let us take an example:
|
||||
|
||||
```python
|
||||
def divide(x, y):
|
||||
if y == 0:
|
||||
raise ZeroDivisionError("Can't divide by zero!") # Raise an exception with a message
|
||||
result = x / y
|
||||
return result
|
||||
|
||||
try:
|
||||
division_result = divide(10, 0)
|
||||
print("Result:", division_result)
|
||||
except ZeroDivisionError as e:
|
||||
print("An error occured:", e) # Handle the exception and print the message
|
||||
```
|
||||
|
||||
Output
|
||||
|
||||
```markdown
|
||||
An error occured: Can't divide by zero!
|
||||
```
|
||||
|
||||
## Advantages of Exception Handling
|
||||
|
||||
- **Improved Error Handling** - It allows you to gracefully handle unexpected situations that arise during program execution. Instead of crashing abruptly, you can define specific actions to take when exceptions occur, providing a smoother experience.
|
||||
- **Code Robustness** - Exception Handling helps you to write more resilient programs by anticipating potential issues and providing approriate responses.
|
||||
- **Enhanced Code Readability** - By seperating error handling logic from the core program flow, your code becomes more readable and easier to understand. The `try-except` blocks clearly indicate where potential errors might occur and how they'll be addressed.
|
||||
|
||||
## Disadvantages of Exception Handling
|
||||
|
||||
- **Hiding Logic Errors** - Relying solely on exception handling might mask underlying logic error in your code. It's essential to write clear and well-tested logic to minimize the need for excessive exception handling.
|
||||
- **Performance Overhead** - In some cases, using `try-except` blocks can introduce a slight performance overhead compared to code without exception handling. Howerer, this is usually negligible for most applications.
|
||||
- **Overuse of Exceptions** - Overusing exceptions for common errors or control flow can make code less readable and harder to maintain. It's important to use exceptions judiciously for unexpected situations.
|
|
@ -7,3 +7,4 @@
|
|||
- [Regular Expressions in Python](regular_expressions.md)
|
||||
- [JSON module](json-module.md)
|
||||
- [Map Function](map-function.md)
|
||||
- [Exception Handling in Python](exception-handling.md)
|
||||
|
|
|
@ -1,3 +1,4 @@
|
|||
# List of sections
|
||||
|
||||
- [Introduction to MySQL and Queries](intro_mysql_queries.md)
|
||||
- [SQLAlchemy and Aggregation Functions](sqlalchemy-aggregation.md)
|
||||
|
|
|
@ -0,0 +1,123 @@
|
|||
# SQLAlchemy
|
||||
SQLAlchemy is a powerful and flexible SQL toolkit and Object-Relational Mapping (ORM) library for Python. It is a versatile library that bridges the gap between Python applications and relational databases.
|
||||
|
||||
SQLAlchemy allows the user to write database-agnostic code that can work with a variety of relational databases such as SQLite, MySQL, PostgreSQL, Oracle, and Microsoft SQL Server. The ORM layer in SQLAlchemy allows developers to map Python classes to database tables. This means you can interact with your database using Python objects instead of writing raw SQL queries.
|
||||
|
||||
## Setting up the Environment
|
||||
* Python and MySQL Server must be installed and configured.
|
||||
* The library: **mysql-connector-python** and **sqlalchemy** must be installed.
|
||||
|
||||
```bash
|
||||
pip install sqlalchemy mysql-connector-python
|
||||
```
|
||||
|
||||
* If not installed, you can install them using the above command in terminal,
|
||||
|
||||
## Establishing Connection with Database
|
||||
|
||||
* Create a connection with the database using the following code snippet:
|
||||
```python
|
||||
from sqlalchemy import create_engine
|
||||
from sqlalchemy.orm import declarative_base
|
||||
from sqlalchemy.orm import sessionmaker
|
||||
|
||||
DATABASE_URL = 'mysql+mysqlconnector://root:12345@localhost/gssoc'
|
||||
|
||||
engine = create_engine(DATABASE_URL)
|
||||
Session = sessionmaker(bind=engine)
|
||||
session = Session()
|
||||
|
||||
Base = declarative_base()
|
||||
```
|
||||
|
||||
* The connection string **DATABASE_URL** is passed as an argument to **create_engine** function which is used to create a connection to the database. This connection string contains the database credentials such as the database type, username, password, and database name.
|
||||
* The **sessionmaker** function is used to create a session object which is used to interact with the database
|
||||
* The **declarative_base** function is used to create a base class for all the database models. This base class is used to define the structure of the database tables.
|
||||
|
||||
## Creating Tables
|
||||
|
||||
* The following code snippet creates a table named **"products"** in the database:
|
||||
```python
|
||||
from sqlalchemy import Column, Integer, String, Float
|
||||
|
||||
class Product(Base):
|
||||
__tablename__ = 'products'
|
||||
id = Column(Integer, primary_key=True)
|
||||
name = Column(String(50))
|
||||
category = Column(String(50))
|
||||
price = Column(Float)
|
||||
quantity = Column(Integer)
|
||||
|
||||
Base.metadata.create_all(engine)
|
||||
```
|
||||
|
||||
* The **Product class** inherits from **Base**, which is a base class for all the database models.
|
||||
* The **Base.metadata.create_all(engine)** statement is used to create the table in the database. The engine object is a connection to the database that was created earlier.
|
||||
|
||||
## Inserting Data for Aggregation Functions
|
||||
|
||||
* The following code snippet inserts data into the **"products"** table:
|
||||
```python
|
||||
products = [
|
||||
Product(name='Laptop', category='Electronics', price=1000, quantity=50),
|
||||
Product(name='Smartphone', category='Electronics', price=700, quantity=150),
|
||||
Product(name='Tablet', category='Electronics', price=400, quantity=100),
|
||||
Product(name='Headphones', category='Accessories', price=100, quantity=200),
|
||||
Product(name='Charger', category='Accessories', price=20, quantity=300),
|
||||
]
|
||||
|
||||
session.add_all(products)
|
||||
session.commit()
|
||||
```
|
||||
|
||||
* A list of **Product** objects is created. Each Product object represents a row in the **products table** in the database.
|
||||
* The **add_all** method of the session object is used to add all the Product objects to the session. This method takes a **list of objects as an argument** and adds them to the session.
|
||||
* The **commit** method of the session object is used to commit the changes made to the database.
|
||||
|
||||
## Aggregation Functions
|
||||
|
||||
SQLAlchemy provides functions that correspond to SQL aggregation functions and are available in the **sqlalchemy.func module**.
|
||||
|
||||
### COUNT
|
||||
|
||||
The **COUNT** function returns the number of rows in a result set. It can be demonstrated using the following code snippet:
|
||||
```python
|
||||
from sqlalchemy import func
|
||||
|
||||
total_products = session.query(func.count(Product.id)).scalar()
|
||||
print(f'Total products: {total_products}')
|
||||
```
|
||||
|
||||
### SUM
|
||||
|
||||
The **SUM** function returns the sum of all values in a column. It can be demonstrated using the following code snippet:
|
||||
```python
|
||||
total_price = session.query(func.sum(Product.price)).scalar()
|
||||
print(f'Total price of all products: {total_price}')
|
||||
```
|
||||
|
||||
### AVG
|
||||
|
||||
The **AVG** function returns the average of all values in a column. It can be demonstrated by the following code snippet:
|
||||
```python
|
||||
average_price = session.query(func.avg(Product.price)).scalar()
|
||||
print(f'Average price of products: {average_price}')
|
||||
```
|
||||
|
||||
### MAX
|
||||
|
||||
The **MAX** function returns the maximum value in a column. It can be demonstrated using the following code snippet :
|
||||
```python
|
||||
max_price = session.query(func.max(Product.price)).scalar()
|
||||
print(f'Maximum price of products: {max_price}')
|
||||
```
|
||||
|
||||
### MIN
|
||||
|
||||
The **MIN** function returns the minimum value in a column. It can be demonstrated using the following code snippet:
|
||||
```python
|
||||
min_price = session.query(func.min(Product.price)).scalar()
|
||||
print(f'Minimum price of products: {min_price}')
|
||||
```
|
||||
|
||||
In general, the aggregation functions can be implemented by utilising the **session** object to execute the desired query on the table present in a database using the **query()** method. The **scalar()** method is called on the query object to execute the query and return a single value
|
Po Szerokość: | Wysokość: | Rozmiar: 13 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 9.2 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 13 KiB |
|
@ -1,5 +1,6 @@
|
|||
# List of sections
|
||||
|
||||
- [Time & Space Complexity](time-space-complexity.md)
|
||||
- [Queues in Python](Queues.md)
|
||||
- [Graphs](graph.md)
|
||||
- [Sorting Algorithms](sorting-algorithms.md)
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
When a function calls itself to solve smaller instances of the same problem until a specified condition is fulfilled is called recursion. It is used for tasks that can be divided into smaller sub-tasks.
|
||||
|
||||
# How Recursion Works
|
||||
## How Recursion Works
|
||||
|
||||
To solve a problem using recursion we must define:
|
||||
- Base condition :- The condition under which recursion ends.
|
||||
|
@ -17,43 +17,63 @@ When a recursive function is called, the following sequence of events occurs:
|
|||
- Stack Management: Each recursive call is placed on the call stack. The stack keeps track of each function call, its argument, and the point to return to once the call completes.
|
||||
- Unwinding the Stack: When the base case is eventually met, the function returns a value, and the stack starts unwinding, returning values to previous function calls until the initial call is resolved.
|
||||
|
||||
# What is Stack Overflow in Recursion
|
||||
## Python Code: Factorial using Recursion
|
||||
|
||||
```python
|
||||
def fact(n):
|
||||
if n == 0 or n == 1:
|
||||
return 1
|
||||
return n * fact(n - 1)
|
||||
|
||||
if __name__ == "__main__":
|
||||
n = int(input("Enter a positive number: "))
|
||||
print("Factorial of", n, "is", fact(n))
|
||||
```
|
||||
|
||||
### Explanation
|
||||
|
||||
This Python script calculates the factorial of a given number using recursion.
|
||||
|
||||
- **Function `fact(n)`:**
|
||||
- The function takes an integer `n` as input and calculates its factorial.
|
||||
- It checks if `n` is 0 or 1. If so, it returns 1 (since the factorial of 0 and 1 is 1).
|
||||
- Otherwise, it returns `n * fact(n - 1)`, which means it recursively calls itself with `n - 1` until it reaches either 0 or 1.
|
||||
|
||||
- **Main Section:**
|
||||
- The main section prompts the user to enter a positive number.
|
||||
- It then calls the `fact` function with the input number and prints the result.
|
||||
|
||||
#### Example : Let n = 4
|
||||
|
||||
The recursion unfolds as follows:
|
||||
1. When `fact(4)` is called, it computes `4 * fact(3)`.
|
||||
2. Inside `fact(3)`, it computes `3 * fact(2)`.
|
||||
3. Inside `fact(2)`, it computes `2 * fact(1)`.
|
||||
4. `fact(1)` returns 1 (`if` statement executes), which is received by `fact(2)`, resulting in `2 * 1` i.e. `2`.
|
||||
5. Back to `fact(3)`, it receives the value from `fact(2)`, giving `3 * 2` i.e. `6`.
|
||||
6. `fact(4)` receives the value from `fact(3)`, resulting in `4 * 6` i.e. `24`.
|
||||
7. Finally, `fact(4)` returns 24 to the main function.
|
||||
|
||||
#### So, the result is 24.
|
||||
|
||||
#### What is Stack Overflow in Recursion?
|
||||
|
||||
Stack overflow is an error that occurs when the call stack memory limit is exceeded. During execution of recursion calls they are simultaneously stored in a recursion stack waiting for the recursive function to be completed. Without a base case, the function would call itself indefinitely, leading to a stack overflow.
|
||||
|
||||
# Example
|
||||
|
||||
- Factorial of a Number
|
||||
|
||||
The factorial of i natural numbers is nth integer multiplied by factorial of (i-1) numbers. The base case is if i=0 we return 1 as factorial of 0 is 1.
|
||||
|
||||
```python
|
||||
def factorial(i):
|
||||
#base case
|
||||
if i==0 :
|
||||
return 1
|
||||
#recursive case
|
||||
else :
|
||||
return i * factorial(i-1)
|
||||
i = 6
|
||||
print("Factorial of i is :", factorial(i)) # Output- Factorial of i is :720
|
||||
```
|
||||
# What is Backtracking
|
||||
## What is Backtracking
|
||||
|
||||
Backtracking is a recursive algorithmic technique used to solve problems by exploring all possible solutions and discarding those that do not meet the problem's constraints. It is particularly useful for problems involving combinations, permutations, and finding paths in a grid.
|
||||
|
||||
# How Backtracking Works
|
||||
## How Backtracking Works
|
||||
|
||||
- Incremental Solution Building: Solutions are built one step at a time.
|
||||
- Feasibility Check: At each step, a check is made to see if the current partial solution is valid.
|
||||
- Backtracking: If a partial solution is found to be invalid, the algorithm backtracks by removing the last added part of the solution and trying the next possibility.
|
||||
- Exploration of All Possibilities: The process continues recursively, exploring all possible paths, until a solution is found or all possibilities are exhausted.
|
||||
|
||||
# Example
|
||||
## Example: Word Search
|
||||
|
||||
- Word Search
|
||||
|
||||
Given a 2D grid of characters and a word, determine if the word exists in the grid. The word can be constructed from letters of sequentially adjacent cells, where "adjacent" cells are horizontally or vertically neighboring. The same letter cell may not be used more than once.
|
||||
Given a 2D grid of characters and a word, determine if the word exists in the grid. The word can be constructed from letters of sequentially adjacent cells, where "adjacent" cells are horizontally or vertically neighboring. The same letter cell may not be used more than once.
|
||||
|
||||
Algorithm for Solving the Word Search Problem with Backtracking:
|
||||
- Start at each cell: Attempt to find the word starting from each cell.
|
||||
|
|
|
@ -0,0 +1,243 @@
|
|||
# Time and Space Complexity
|
||||
|
||||
We can solve a problem using one or more algorithms. It's essential to learn how to compare the performance of different algorithms and select the best one for a specific task.
|
||||
|
||||
Therefore, it is highly required to use a method to compare the solutions in order to judge which one is more optimal.
|
||||
|
||||
The method must be:
|
||||
|
||||
- Regardless of the system or its settings on which the algorithm is executing.
|
||||
- Demonstrate a direct relationship with the quantity of inputs.
|
||||
- Able to discriminate between two methods with clarity and precision.
|
||||
|
||||
Two such methods use to analyze algorithms are `time complexity` and `space complexity`.
|
||||
|
||||
## What is Time Complexity?
|
||||
|
||||
The _number of operations an algorithm performs in proportion to the quantity of the input_ is measured by time complexity. It facilitates our investigation of how the performance of the algorithm scales with increasing input size. But in real life, **_time complexity does not refer to the time taken by the machine to execute a particular code_**.
|
||||
|
||||
## Order of Growth and Asymptotic Notations
|
||||
|
||||
The Order of Growth explains how an algorithm's space or running time expands as the amount of the input does. This increase is described via asymptotic language, such Big O notation, which concentrates on the dominating term as the input size approaches infinity and is independent of lower-order terms and machine-specific constants.
|
||||
|
||||
### Common Asymptotic Notation
|
||||
|
||||
1. `Big Oh (O)`: Provides the worst-case scenario for describing the upper bound of an algorithm's execution time.
|
||||
2. `Big Omega (Ω)`: Provides the best-case scenario and describes the lower bound.
|
||||
3. `Big Theta (Θ)`: Gives a tight constraint on the running time by describing both the upper and lower bounds.
|
||||
|
||||
### 1. Big Oh (O) Notation
|
||||
|
||||
Big O notation describes how an algorithm behaves as the input size gets closer to infinity and provides an upper bound on the time or space complexity of the method. It helps developers and computer scientists to evaluate the effectiveness of various algorithms without regard to the software or hardware environment.
|
||||
|
||||
To denote asymptotic upper bound, we use O-notation. For a given function `g(n)`, we denote by `O(g(n))` (pronounced "big-oh of g of n") the set of functions:
|
||||
|
||||
$$
|
||||
O(g(n)) = \{ f(n) : \exists \text{ positive constants } c \text{ and } n_0 \text{ such that } 0 \leq f(n) \leq c \cdot g(n) \text{ for all } n \geq n_0 \}
|
||||
$$
|
||||
|
||||
Graphical representation of Big Oh:
|
||||
|
||||

|
||||
|
||||
### 2. Big Omega (Ω) Notation
|
||||
|
||||
Big Omega (Ω) notation is used to describe the lower bound of an algorithm's running time. It provides a way to express the minimum time complexity that an algorithm will take to complete. In other words, Big Omega gives us a guarantee that the algorithm will take at least a certain amount of time to run, regardless of other factors.
|
||||
|
||||
To denote asymptotic lower bound, we use Omega-notation. For a given function `g(n)`, we denote by `Ω(g(n))` (pronounced "big-omega of g of n") the set of functions:
|
||||
|
||||
$$
|
||||
\Omega(g(n)) = \{ f(n) : \exists \text{ positive constants } c \text{ and } n_0 \text{ such that } 0 \leq c \cdot g(n) \leq f(n) \text{ for all } n \geq n_0 \}
|
||||
$$
|
||||
|
||||
Graphical representation of Big Omega:
|
||||
|
||||

|
||||
|
||||
### 3. Big Theta (Θ) Notation
|
||||
|
||||
Big Theta (Θ) notation provides a way to describe the asymptotic tight bound of an algorithm's running time. It offers a precise measure of the time complexity by establishing both an upper and lower bound, indicating that the running time of an algorithm grows at the same rate as a given function, up to constant factors.
|
||||
|
||||
To denote asymptotic tight bound, we use Theta-notation. For a given function `g(n)`, we denote by `Θ(g(n))` (pronounced "big-theta of g of n") the set of functions:
|
||||
|
||||
$$
|
||||
\Theta(g(n)) = \{ f(n) : \exists \text{ positive constants } c_1, c_2, \text{ and } n_0 \text{ such that } 0 \leq c_1 \cdot g(n) \leq f(n) \leq c_2 \cdot g(n) \text{ for all } n \geq n_0 \}
|
||||
$$
|
||||
|
||||
Graphical representation of Big Theta:
|
||||
|
||||

|
||||
|
||||
## Best Case, Worst Case and Average Case
|
||||
|
||||
### 1. Best-Case Scenario:
|
||||
|
||||
The best-case scenario refers to the situation where an algorithm performs optimally, achieving the lowest possible time or space complexity. It represents the most favorable conditions under which an algorithm operates.
|
||||
|
||||
#### Characteristics:
|
||||
|
||||
- Represents the minimum time or space required by an algorithm to solve a problem.
|
||||
- Occurs when the input data is structured in such a way that the algorithm can exploit its strengths fully.
|
||||
- Often used to analyze the lower bound of an algorithm's performance.
|
||||
|
||||
#### Example:
|
||||
|
||||
Consider the `linear search algorithm` where we're searching for a `target element` in an array. The best-case scenario occurs when the target element is found `at the very beginning of the array`. In this case, the algorithm would only need to make one comparison, resulting in a time complexity of `O(1)`.
|
||||
|
||||
### 2. Worst-Case Scenario:
|
||||
|
||||
The worst-case scenario refers to the situation where an algorithm performs at its poorest, achieving the highest possible time or space complexity. It represents the most unfavorable conditions under which an algorithm operates.
|
||||
|
||||
#### Characteristics:
|
||||
|
||||
- Represents the maximum time or space required by an algorithm to solve a problem.
|
||||
- Occurs when the input data is structured in such a way that the algorithm encounters the most challenging conditions.
|
||||
- Often used to analyze the upper bound of an algorithm's performance.
|
||||
|
||||
#### Example:
|
||||
|
||||
Continuing with the `linear search algorithm`, the worst-case scenario occurs when the `target element` is either not present in the array or located `at the very end`. In this case, the algorithm would need to iterate through the entire array, resulting in a time complexity of `O(n)`, where `n` is the size of the array.
|
||||
|
||||
### 3. Average-Case Scenario:
|
||||
|
||||
The average-case scenario refers to the expected performance of an algorithm over all possible inputs, typically calculated as the arithmetic mean of the time or space complexity.
|
||||
|
||||
#### Characteristics:
|
||||
|
||||
- Represents the typical performance of an algorithm across a range of input data.
|
||||
- Takes into account the distribution of inputs and their likelihood of occurrence.
|
||||
- Provides a more realistic measure of an algorithm's performance compared to the best-case or worst-case scenarios.
|
||||
|
||||
#### Example:
|
||||
|
||||
For the `linear search algorithm`, the average-case scenario considers the probability distribution of the target element's position within the array. If the `target element is equally likely to be found at any position in the array`, the average-case time complexity would be `O(n/2)`, as the algorithm would, on average, need to search halfway through the array.
|
||||
|
||||
## Space Complexity
|
||||
|
||||
The memory space that a code utilizes as it is being run is often referred to as space complexity. Additionally, space complexity depends on the machine, therefore rather than using the typical memory units like MB, GB, etc., we will express space complexity using the Big O notation.
|
||||
|
||||
#### Examples of Space Complexity
|
||||
|
||||
1. `Constant Space Complexity (O(1))`: Algorithms that operate on a fixed-size array or use a constant number of variables have O(1) space complexity.
|
||||
2. `Linear Space Complexity (O(n))`: Algorithms that store each element of the input array in a separate variable or data structure have O(n) space complexity.
|
||||
3. `Quadratic Space Complexity (O(n^2))`: Algorithms that create a two-dimensional array or matrix with dimensions based on the input size have O(n^2) space complexity.
|
||||
|
||||
#### Analyzing Space Complexity
|
||||
|
||||
To analyze space complexity:
|
||||
|
||||
- Identify the variables, data structures, and recursive calls used by the algorithm.
|
||||
- Determine how the space requirements scale with the input size.
|
||||
- Express the space complexity using Big O notation, considering the dominant terms that contribute most to the overall space usage.
|
||||
|
||||
## Examples to calculate time and space complexity
|
||||
|
||||
#### 1. Print all elements of given array
|
||||
|
||||
Consider each line takes one unit of time to run. So, to simply iterate over an array to print all elements it will take `O(n)` time, where n is the size of array.
|
||||
|
||||
Code:
|
||||
|
||||
```python
|
||||
arr = [1,2,3,4] #1
|
||||
for x in arr: #2
|
||||
print(x) #3
|
||||
```
|
||||
|
||||
Here, the 1st statement executes only once. So, it takes one unit of time to run. The for loop consisting of 2nd and 3rd statements executes 4 times.
|
||||
Also, as the code dosen't take any additional space except the input arr its Space Complexity is O(1) constant.
|
||||
|
||||
#### 2. Linear Search
|
||||
|
||||
Linear search is a simple algorithm for finding an element in an array by sequentially checking each element until a match is found or the end of the array is reached. Here's an example of calculating the time and space complexity of linear search:
|
||||
|
||||
```python
|
||||
def linear_search(arr, target):
|
||||
for x in arr: # n iterations in worst case
|
||||
if x == target: # 1
|
||||
return True # 1
|
||||
return False # If element not found
|
||||
|
||||
# Example usage
|
||||
arr = [1, 3, 5, 7, 9]
|
||||
target = 5
|
||||
print(linear_search(arr, target))
|
||||
```
|
||||
|
||||
**Time Complexity Analysis**
|
||||
|
||||
The for loop iterates through the entire array, which takes O(n) time in the worst case, where n is the size of the array.
|
||||
Inside the loop, each operation takes constant time (O(1)).
|
||||
Therefore, the time complexity of linear search is `O(n)`.
|
||||
|
||||
**Space Complexity Analysis**
|
||||
|
||||
The space complexity of linear search is `O(1)` since it only uses a constant amount of additional space for variables regardless of the input size.
|
||||
|
||||
|
||||
#### 3. Binary Search
|
||||
|
||||
Binary search is an efficient algorithm for finding an element in a sorted array by repeatedly dividing the search interval in half. Here's an example of calculating the time and space complexity of binary search:
|
||||
|
||||
```python
|
||||
def binary_search(arr, target):
|
||||
left = 0 # 1
|
||||
right = len(arr) - 1 # 1
|
||||
|
||||
while left <= right: # log(n) iterations in worst case
|
||||
mid = (left + right) // 2 # log(n)
|
||||
|
||||
if arr[mid] == target: # 1
|
||||
return mid # 1
|
||||
elif arr[mid] < target: # 1
|
||||
left = mid + 1 # 1
|
||||
else:
|
||||
right = mid - 1 # 1
|
||||
|
||||
return -1 # If element not found
|
||||
|
||||
# Example usage
|
||||
arr = [1, 3, 5, 7, 9]
|
||||
target = 5
|
||||
print(binary_search(arr, target))
|
||||
```
|
||||
|
||||
**Time Complexity Analysis**
|
||||
|
||||
The initialization of left and right takes constant time (O(1)).
|
||||
The while loop runs for log(n) iterations in the worst case, where n is the size of the array.
|
||||
Inside the loop, each operation takes constant time (O(1)).
|
||||
Therefore, the time complexity of binary search is `O(log n)`.
|
||||
|
||||
**Space Complexity Analysis**
|
||||
|
||||
The space complexity of binary search is `O(1)` since it only uses a constant amount of additional space for variables regardless of the input size.
|
||||
|
||||
#### 4. Fibbonaci Sequence
|
||||
|
||||
Let's consider an example of a function that generates Fibonacci numbers up to a given index and stores them in a list. In this case, the space complexity will not be constant because the size of the list grows with the Fibonacci sequence.
|
||||
|
||||
```python
|
||||
def fibonacci_sequence(n):
|
||||
fib_list = [0, 1] # Initial Fibonacci sequence with first two numbers
|
||||
|
||||
while len(fib_list) < n: # O(n) iterations in worst case
|
||||
next_fib = fib_list[-1] + fib_list[-2] # Calculating next Fibonacci number
|
||||
fib_list.append(next_fib) # Appending next Fibonacci number to list
|
||||
|
||||
return fib_list
|
||||
|
||||
# Example usage
|
||||
n = 10
|
||||
fib_sequence = fibonacci_sequence(n)
|
||||
print(fib_sequence)
|
||||
```
|
||||
|
||||
**Time Complexity Analysis**
|
||||
|
||||
The while loop iterates until the length of the Fibonacci sequence list reaches n, so it takes `O(n)` iterations in the `worst case`.Inside the loop, each operation takes constant time (O(1)).
|
||||
|
||||
**Space Complexity Analysis**
|
||||
|
||||
The space complexity of this function is not constant because it creates and stores a list of Fibonacci numbers.
|
||||
As n grows, the size of the list also grows, so the space complexity is O(n), where n is the index of the last Fibonacci number generated.
|
|
@ -0,0 +1,96 @@
|
|||
# Clustering
|
||||
|
||||
Clustering is an unsupervised machine learning technique that groups a set of objects in such a way that objects in the same group (called a cluster) are more similar to each other than to those in other groups (clusters). This README provides an overview of clustering, including its fundamental concepts, types, algorithms, and how to implement it using Python.
|
||||
|
||||
## Introduction
|
||||
|
||||
Clustering is a technique used to find inherent groupings within data without pre-labeled targets. It is widely used in exploratory data analysis, pattern recognition, image analysis, information retrieval, and bioinformatics.
|
||||
|
||||
## Concepts
|
||||
|
||||
### Centroid
|
||||
|
||||
A centroid is the center of a cluster. In the k-means clustering algorithm, for example, each cluster is represented by its centroid, which is the mean of all the data points in the cluster.
|
||||
|
||||
### Distance Measure
|
||||
|
||||
Distance measures are used to quantify the similarity or dissimilarity between data points. Common distance measures include Euclidean distance, Manhattan distance, and cosine similarity.
|
||||
|
||||
### Inertia
|
||||
|
||||
Inertia is a metric used to assess the quality of the clusters formed. It is the sum of squared distances of samples to their nearest cluster center.
|
||||
|
||||
## Types of Clustering
|
||||
|
||||
1. **Hard Clustering**: Each data point either belongs to a cluster completely or not at all.
|
||||
2. **Soft Clustering (Fuzzy Clustering)**: Each data point can belong to multiple clusters with varying degrees of membership.
|
||||
|
||||
## Clustering Algorithms
|
||||
|
||||
### K-Means Clustering
|
||||
|
||||
K-Means is a popular clustering algorithm that partitions the data into k clusters, where each data point belongs to the cluster with the nearest mean. The algorithm follows these steps:
|
||||
1. Initialize k centroids randomly.
|
||||
2. Assign each data point to the nearest centroid.
|
||||
3. Recalculate the centroids as the mean of all data points assigned to each cluster.
|
||||
4. Repeat steps 2 and 3 until convergence.
|
||||
|
||||
### Hierarchical Clustering
|
||||
|
||||
Hierarchical clustering builds a tree of clusters. There are two types:
|
||||
- **Agglomerative (bottom-up)**: Starts with each data point as a separate cluster and merges the closest pairs of clusters iteratively.
|
||||
- **Divisive (top-down)**: Starts with all data points in one cluster and splits the cluster iteratively into smaller clusters.
|
||||
|
||||
### DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
|
||||
|
||||
DBSCAN groups together points that are close to each other based on a distance measurement and a minimum number of points. It can find arbitrarily shaped clusters and is robust to noise.
|
||||
|
||||
## Implementation
|
||||
|
||||
### Using Scikit-learn
|
||||
|
||||
Scikit-learn is a popular machine learning library in Python that provides tools for clustering.
|
||||
|
||||
### Code Example
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
from sklearn.cluster import KMeans
|
||||
from sklearn.preprocessing import StandardScaler
|
||||
from sklearn.metrics import silhouette_score
|
||||
|
||||
# Load dataset
|
||||
data = pd.read_csv('path/to/your/dataset.csv')
|
||||
|
||||
# Preprocess the data
|
||||
scaler = StandardScaler()
|
||||
data_scaled = scaler.fit_transform(data)
|
||||
|
||||
# Initialize and fit KMeans model
|
||||
kmeans = KMeans(n_clusters=3, random_state=42)
|
||||
kmeans.fit(data_scaled)
|
||||
|
||||
# Get cluster labels
|
||||
labels = kmeans.labels_
|
||||
|
||||
# Calculate silhouette score
|
||||
silhouette_avg = silhouette_score(data_scaled, labels)
|
||||
print("Silhouette Score:", silhouette_avg)
|
||||
|
||||
# Add cluster labels to the original data
|
||||
data['Cluster'] = labels
|
||||
|
||||
print(data.head())
|
||||
```
|
||||
|
||||
## Evaluation Metrics
|
||||
|
||||
- **Silhouette Score**: Measures how similar a data point is to its own cluster compared to other clusters.
|
||||
- **Inertia (Within-cluster Sum of Squares)**: Measures the compactness of the clusters.
|
||||
- **Davies-Bouldin Index**: Measures the average similarity ratio of each cluster with the cluster that is most similar to it.
|
||||
- **Dunn Index**: Ratio of the minimum inter-cluster distance to the maximum intra-cluster distance.
|
||||
|
||||
## Conclusion
|
||||
|
||||
Clustering is a powerful technique for discovering structure in data. Understanding different clustering algorithms and their evaluation metrics is crucial for selecting the appropriate method for a given problem.
|
|
@ -0,0 +1,71 @@
|
|||
# Grid Search
|
||||
|
||||
Grid Search is a hyperparameter tuning technique in Machine Learning that helps to find the best combination of hyperparameters for a given model. It works by defining a grid of hyperparameters and then training the model with all the possible combinations of hyperparameters to find the best performing set.
|
||||
|
||||
The Grid Search Method considers some hyperparameter combinations and selects the one returning a lower error score. This method is specifically useful when there are only some hyperparameters in order to optimize. However, it is outperformed by other weighted-random search methods when the Machine Learning model grows in complexity.
|
||||
|
||||
## Implementation
|
||||
|
||||
Before applying Grid Searching on any algorithm, data is divided into training and validation set, a validation set is used to validate the models. A model with all possible combinations of hyperparameters is tested on the validation set to choose the best combination.
|
||||
|
||||
Grid Searching can be applied to any hyperparameters algorithm whose performance can be improved by tuning hyperparameter. For example, we can apply grid searching on K-Nearest Neighbors by validating its performance on a set of values of K in it. Same thing we can do with Logistic Regression by using a set of values of learning rate to find the best learning rate at which Logistic Regression achieves the best accuracy.
|
||||
|
||||
Let us consider that the model accepts the below three parameters in the form of input:
|
||||
1. Number of hidden layers `[2, 4]`
|
||||
2. Number of neurons in every layer `[5, 10]`
|
||||
3. Number of epochs `[10, 50]`
|
||||
|
||||
If we want to try out two options for every parameter input (as specified in square brackets above), it estimates different combinations. For instance, one possible combination can be `[2, 5, 10]`. Finding such combinations manually would be a headache.
|
||||
|
||||
Now, suppose that we had ten different parameters as input, and we would like to try out five possible values for each and every parameter. It would need manual input from the programmer's end every time we like to alter the value of a parameter, re-execute the code, and keep a record of the outputs for every combination of the parameters.
|
||||
|
||||
Grid Search automates that process, as it accepts the possible value for every parameter and executes the code in order to try out each and every possible combination outputs the result for the combinations and outputs the combination having the best accuracy.
|
||||
|
||||
Higher values of C tell the model, the training data resembles real world information, place a greater weight on the training data. While lower values of C do the opposite.
|
||||
|
||||
## Explaination of the Code
|
||||
|
||||
The code provided performs hyperparameter tuning for a Logistic Regression model using a manual grid search approach. It evaluates the model's performance for different values of the regularization strength hyperparameter C on the Iris dataset.
|
||||
1. datasets from sklearn is imported to load the Iris dataset.
|
||||
2. LogisticRegression from sklearn.linear_model is imported to create and fit the logistic regression model.
|
||||
3. The Iris dataset is loaded, with X containing the features and y containing the target labels.
|
||||
4. A LogisticRegression model is instantiated with max_iter=10000 to ensure convergence during the fitting process, as the default maximum iterations (100) might not be sufficient.
|
||||
5. A list of different values for the regularization strength C is defined. The hyperparameter C controls the regularization strength, with smaller values specifying stronger regularization.
|
||||
6. An empty list scores is initialized to store the model's performance scores for different values of C.
|
||||
7. A for loop iterates over each value in the C list:
|
||||
8. logit.set_params(C=choice) sets the C parameter of the logistic regression model to the current value in the loop.
|
||||
9. logit.fit(X, y) fits the logistic regression model to the entire Iris dataset (this is typically done on training data in a real scenario, not the entire dataset).
|
||||
10. logit.score(X, y) calculates the accuracy of the fitted model on the dataset and appends this score to the scores list.
|
||||
11. After the loop, the scores list is printed, showing the accuracy for each value of C.
|
||||
|
||||
### Python Code
|
||||
|
||||
```python
|
||||
from sklearn import datasets
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
|
||||
iris = datasets.load_iris()
|
||||
X = iris['data']
|
||||
y = iris['target']
|
||||
|
||||
logit = LogisticRegression(max_iter = 10000)
|
||||
|
||||
C = [0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2]
|
||||
|
||||
scores = []
|
||||
for choice in C:
|
||||
logit.set_params(C=choice)
|
||||
logit.fit(X, y)
|
||||
scores.append(logit.score(X, y))
|
||||
print(scores)
|
||||
```
|
||||
|
||||
#### Results
|
||||
|
||||
```
|
||||
[0.9666666666666667, 0.9666666666666667, 0.9733333333333334, 0.9733333333333334, 0.98, 0.98, 0.9866666666666667, 0.9866666666666667]
|
||||
```
|
||||
|
||||
We can see that the lower values of `C` performed worse than the base parameter of `1`. However, as we increased the value of `C` to `1.75` the model experienced increased accuracy.
|
||||
|
||||
It seems that increasing `C` beyond this amount does not help increase model accuracy.
|
|
@ -6,7 +6,10 @@
|
|||
- [Decision Tree Learning](Decision-Tree.md)
|
||||
- [Support Vector Machine Algorithm](support-vector-machine.md)
|
||||
- [Artificial Neural Network from the Ground Up](ArtificialNeuralNetwork.md)
|
||||
- [Introduction To Convolutional Neural Networks (CNNs)](intro-to-cnn.md)
|
||||
- [TensorFlow.md](tensorFlow.md)
|
||||
- [PyTorch.md](pytorch.md)
|
||||
- [Types of optimizers](Types_of_optimizers.md)
|
||||
- [Introduction To Convolutional Neural Networks (CNNs)](intro-to-cnn.md)
|
||||
- [Logistic Regression](logistic-regression.md)
|
||||
- [Clustering](clustering.md)
|
||||
- [Grid Search](grid-search.md)
|
||||
|
|
|
@ -0,0 +1,115 @@
|
|||
# Logistic Regression
|
||||
|
||||
Logistic Regression is a statistical method used for binary classification problems. It is a type of regression analysis where the dependent variable is categorical. This README provides an overview of logistic regression, including its fundamental concepts, assumptions, and how to implement it using Python.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Introduction](#introduction)
|
||||
2. [Concepts](#concepts)
|
||||
3. [Assumptions](#assumptions)
|
||||
4. [Implementation](#implementation)
|
||||
- [Using Scikit-learn](#using-scikit-learn)
|
||||
- [Code Example](#code-example)
|
||||
5. [Evaluation Metrics](#evaluation-metrics)
|
||||
6. [Conclusion](#conclusion)
|
||||
7. [References](#references)
|
||||
|
||||
## Introduction
|
||||
|
||||
Logistic Regression is used to model the probability of a binary outcome based on one or more predictor variables (features). It is widely used in various fields such as medical research, social sciences, and machine learning for tasks such as spam detection, fraud detection, and predicting user behavior.
|
||||
|
||||
## Concepts
|
||||
|
||||
### Sigmoid Function
|
||||
|
||||
The logistic regression model uses the sigmoid function to map predicted values to probabilities. The sigmoid function is defined as:
|
||||
|
||||
$$
|
||||
\sigma(z) = \frac{1}{1 + e^{-z}}
|
||||
$$
|
||||
|
||||
Where \( z \) is a linear combination of the input features.
|
||||
|
||||
### Odds and Log-Odds
|
||||
|
||||
- **Odds**: The odds represent the ratio of the probability of an event occurring to the probability of it not occurring.
|
||||
|
||||
$$\text{Odds} = \frac{P(Y=1)}{P(Y=0)}$$
|
||||
|
||||
- **Log-Odds**: The log-odds is the natural logarithm of the odds.
|
||||
|
||||
$$\text{Log-Odds} = \log \left( \frac{P(Y=1)}{P(Y=0)} \right)$$
|
||||
|
||||
Logistic regression models the log-odds as a linear combination of the input features.
|
||||
|
||||
### Model Equation
|
||||
|
||||
The logistic regression model equation is:
|
||||
|
||||
$$
|
||||
\log \left( \frac{P(Y=1)}{P(Y=0)} \right) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_n X_n
|
||||
$$
|
||||
|
||||
Where:
|
||||
- β₀ is the intercept.
|
||||
- β<sub>i</sub> are the coefficients for the predictor variables X<sub>i</sub>.
|
||||
|
||||
|
||||
## Assumptions
|
||||
|
||||
1. **Linearity**: The log-odds of the response variable are a linear combination of the predictor variables.
|
||||
2. **Independence**: Observations should be independent of each other.
|
||||
3. **No Multicollinearity**: Predictor variables should not be highly correlated with each other.
|
||||
4. **Large Sample Size**: Logistic regression requires a large sample size to provide reliable results.
|
||||
|
||||
## Implementation
|
||||
|
||||
### Using Scikit-learn
|
||||
|
||||
Scikit-learn is a popular machine learning library in Python that provides tools for logistic regression.
|
||||
|
||||
### Code Example
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
from sklearn.model_selection import train_test_split
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
|
||||
|
||||
# Load dataset
|
||||
data = pd.read_csv('path/to/your/dataset.csv')
|
||||
|
||||
# Define features and target variable
|
||||
X = data[['feature1', 'feature2', 'feature3']]
|
||||
y = data['target']
|
||||
|
||||
# Split data into training and testing sets
|
||||
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
|
||||
|
||||
# Initialize and train logistic regression model
|
||||
model = LogisticRegression()
|
||||
model.fit(X_train, y_train)
|
||||
|
||||
# Make predictions
|
||||
y_pred = model.predict(X_test)
|
||||
|
||||
# Evaluate the model
|
||||
accuracy = accuracy_score(y_test, y_pred)
|
||||
conf_matrix = confusion_matrix(y_test, y_pred)
|
||||
class_report = classification_report(y_test, y_pred)
|
||||
|
||||
print("Accuracy:", accuracy)
|
||||
print("Confusion Matrix:\n", conf_matrix)
|
||||
print("Classification Report:\n", class_report)
|
||||
```
|
||||
|
||||
## Evaluation Metrics
|
||||
|
||||
- **Accuracy**: The proportion of correctly classified instances among all instances.
|
||||
- **Confusion Matrix**: A table showing the number of true positives, true negatives, false positives, and false negatives.
|
||||
- **Precision, Recall, and F1-Score**: Metrics to evaluate the performance of the classification model.
|
||||
|
||||
## Conclusion
|
||||
|
||||
Logistic regression is a fundamental classification technique that is easy to implement and interpret. It is a powerful tool for binary classification problems and provides a probabilistic framework for predicting binary outcomes.
|
|
@ -0,0 +1,120 @@
|
|||
# NumPy Array Iteration
|
||||
|
||||
Iterating over arrays in NumPy is a common task when processing data. NumPy provides several ways to iterate over elements of an array efficiently.
|
||||
Understanding these methods is crucial for performing operations on array elements effectively.
|
||||
|
||||
## 1. Basic Iteration
|
||||
|
||||
- Iterating using basic `for` loop.
|
||||
|
||||
### Single-dimensional array
|
||||
|
||||
Iterating over a single-dimensional array is straightforward using a basic `for` loop
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
|
||||
arr = np.array([1, 2, 3, 4, 5])
|
||||
for i in arr:
|
||||
print(i)
|
||||
```
|
||||
|
||||
#### Output
|
||||
|
||||
```python
|
||||
1
|
||||
2
|
||||
3
|
||||
4
|
||||
5
|
||||
```
|
||||
|
||||
### Multi-dimensional array
|
||||
|
||||
Iterating over multi-dimensional arrays, each iteration returns a sub-array along the first axis.
|
||||
|
||||
```python
|
||||
marr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
|
||||
|
||||
for arr in marr:
|
||||
print(arr)
|
||||
```
|
||||
|
||||
#### Output
|
||||
|
||||
```python
|
||||
[1 2 3]
|
||||
[4 5 6]
|
||||
[7 8 9]
|
||||
```
|
||||
|
||||
## 2. Iterating with `nditer`
|
||||
|
||||
- `nditer` is a powerful iterator provided by NumPy for iterating over multi-dimensional arrays.
|
||||
- In each interation it gives each element.
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
|
||||
arr = np.array([[1, 2, 3], [4, 5, 6]])
|
||||
for i in np.nditer(arr):
|
||||
print(i)
|
||||
```
|
||||
|
||||
#### Output
|
||||
|
||||
```python
|
||||
1
|
||||
2
|
||||
3
|
||||
4
|
||||
5
|
||||
6
|
||||
```
|
||||
|
||||
## 3. Iterating with `ndenumerate`
|
||||
|
||||
- `ndenumerate` allows you to iterate with both the index and the value of each element.
|
||||
- It gives index and value as output in each iteration
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
|
||||
arr = np.array([[1, 2], [3, 4]])
|
||||
for index,value in np.ndenumerate(arr):
|
||||
print(index,value)
|
||||
```
|
||||
|
||||
#### Output
|
||||
|
||||
```python
|
||||
(0, 0) 1
|
||||
(0, 1) 2
|
||||
(1, 0) 3
|
||||
(1, 1) 4
|
||||
```
|
||||
|
||||
## 4. Iterating with flat
|
||||
|
||||
- The `flat` attribute returns a 1-D iterator over the array.
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
|
||||
arr = np.array([[1, 2], [3, 4]])
|
||||
for element in arr.flat:
|
||||
print(element)
|
||||
```
|
||||
|
||||
#### Output
|
||||
|
||||
```python
|
||||
1
|
||||
2
|
||||
3
|
||||
4
|
||||
```
|
||||
|
||||
Understanding the various ways to iterate over NumPy arrays can significantly enhance your data processing efficiency.
|
||||
|
||||
Whether you are working with single-dimensional or multi-dimensional arrays, NumPy provides versatile tools to iterate and manipulate array elements effectively.
|
|
@ -0,0 +1,223 @@
|
|||
# Concatenation of Arrays
|
||||
|
||||
Concatenation of arrays in NumPy refers to combining multiple arrays into a single array, either along existing axes or by adding new axes. NumPy provides several functions for this purpose.
|
||||
|
||||
# Functions of Concatenation
|
||||
|
||||
## np.concatenate
|
||||
|
||||
Joins two or more arrays along an existing axis.
|
||||
|
||||
### Syntax
|
||||
|
||||
```python
|
||||
numpy.concatenate((arr1, arr2, ...), axis)
|
||||
```
|
||||
|
||||
Args:
|
||||
- arr1, arr2, ...: Sequence of arrays to concatenate.
|
||||
- axis: Axis along which the arrays will be joined. Default is 0.
|
||||
|
||||
### Example
|
||||
|
||||
#### Concatenate along axis 0
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
#creating 2 arrays
|
||||
arr1 = np.array([1 2 3],[7 8 9])
|
||||
arr2 = np.array([4 5 6],[10 11 12])
|
||||
|
||||
result_1 = np.concatenate((arr1, arr2), axis=0)
|
||||
print(result_1)
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
[[ 1 2 3]
|
||||
[ 7 8 9]
|
||||
[ 4 5 6]
|
||||
[10 11 12]]
|
||||
```
|
||||
|
||||
#### Concatenate along axis 1
|
||||
|
||||
```python
|
||||
result_2 = np.concatenate((arr1, arr2), axis=1)
|
||||
print(result_2)
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
[[ 1 2 3 4 5 6 ]
|
||||
[ 7 8 9 10 11 12]]
|
||||
```
|
||||
|
||||
## np.vstack
|
||||
|
||||
Vertical stacking of arrays (row-wise).
|
||||
|
||||
### Syntax
|
||||
|
||||
```python
|
||||
numpy.vstack(arrays)
|
||||
```
|
||||
|
||||
Args:
|
||||
- arrays: Sequence of arrays to stack.
|
||||
|
||||
### Example
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
#create arrays
|
||||
arr1= np.array([1 2 3], [7 8 9])
|
||||
arr2 = np.array([4 5 6],[10 11 12])
|
||||
|
||||
result = np.vstack((arr1, arr2))
|
||||
print(result)
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
[[ 1 2 3]
|
||||
[ 7 8 9]
|
||||
[ 4 5 6]
|
||||
[10 11 12]]
|
||||
```
|
||||
|
||||
## 3. np.hstack
|
||||
|
||||
Stacks arrays horizontally (column-wise).
|
||||
|
||||
### Syntax
|
||||
|
||||
```python
|
||||
numpy.hstack(arrays)
|
||||
```
|
||||
|
||||
Args:
|
||||
- arrays: Sequence of arrays to stack.
|
||||
|
||||
### Example
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
#create arrays
|
||||
arr1= np.array([1 2 3], [7 8 9])
|
||||
arr2 = np.array([4 5 6],[10 11 12])
|
||||
|
||||
result = np.hstack((arr1, arr2))
|
||||
print(result)
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
[[ 1 2 3] [ 4 5 6]
|
||||
[ 7 8 9] [10 11 12]]
|
||||
```
|
||||
|
||||
## np.dstack
|
||||
|
||||
Stacks arrays along the third axis (depth-wise).
|
||||
|
||||
### Syntax
|
||||
|
||||
```python
|
||||
numpy.dstack(arrays)
|
||||
```
|
||||
|
||||
- arrays: Sequence of arrays to stack.
|
||||
|
||||
### Example
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
#create arrays
|
||||
arr1= np.array([1 2 3], [7 8 9])
|
||||
arr2 = np.array([4 5 6],[10 11 12])
|
||||
|
||||
result = np.dstack((arr1, arr2))
|
||||
print(result)
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
[[[ 1 4]
|
||||
[ 2 5]
|
||||
[ 3 6]]
|
||||
|
||||
[[ 7 10]
|
||||
[ 8 11]
|
||||
[ 9 12]]]
|
||||
```
|
||||
|
||||
## np.stack
|
||||
|
||||
Joins a sequence of arrays along a new axis.
|
||||
|
||||
```python
|
||||
numpy.stack(arrays, axis)
|
||||
```
|
||||
|
||||
Args:
|
||||
- arrays: Sequence of arrays to stack.
|
||||
|
||||
### Example
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
#create arrays
|
||||
arr1= np.array([1 2 3], [7 8 9])
|
||||
arr2 = np.array([4 5 6],[10 11 12])
|
||||
|
||||
result = np.stack((arr1, arr2), axis=0)
|
||||
print(result)
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
[[[ 1 2 3]
|
||||
[ 7 8 9]]
|
||||
|
||||
[[ 4 5 6]
|
||||
[10 11 12]]]
|
||||
```
|
||||
|
||||
# Concatenation with Mixed Dimensions
|
||||
|
||||
When concatenating arrays with different shapes, it's often necessary to reshape them to have compatible dimensions.
|
||||
|
||||
## Example
|
||||
|
||||
#### Concatenate along axis 0
|
||||
|
||||
```python
|
||||
arr1 = np.array([[1, 2, 3], [4, 5, 6]])
|
||||
arr2 = np.array([7, 8, 9])
|
||||
|
||||
result_0= np.concatenate((arr1, arr2[np.newaxis, :]), axis=0)
|
||||
print(result_0)
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
[[1 2 3]
|
||||
[4 5 6]
|
||||
[7 8 9]]
|
||||
```
|
||||
|
||||
#### Concatenate along axis 1
|
||||
|
||||
```python
|
||||
result_1 = np.concatenate((arr1, arr2[:, np.newaxis]), axis=1)
|
||||
print(result_1)
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
[[1 2 3 7]
|
||||
[4 5 6 8]]
|
||||
```
|
||||
|
||||
|
|
@ -3,8 +3,11 @@
|
|||
- [Installing NumPy](installing-numpy.md)
|
||||
- [Introduction](introduction.md)
|
||||
- [NumPy Data Types](datatypes.md)
|
||||
- [Numpy Array Shape and Reshape](reshape-array.md)
|
||||
- [Basic Mathematics](basic_math.md)
|
||||
- [Operations on Arrays in NumPy](operations-on-arrays.md)
|
||||
- [Loading Arrays from Files](loading_arrays_from_files.md)
|
||||
- [Saving Numpy Arrays into FIles](saving_numpy_arrays_to_files.md)
|
||||
- [Sorting NumPy Arrays](sorting-array.md)
|
||||
- [NumPy Array Iteration](array-iteration.md)
|
||||
- [Concatenation of Arrays](concatenation-of-arrays.md)
|
||||
|
|
|
@ -0,0 +1,57 @@
|
|||
# Numpy Array Shape and Reshape
|
||||
|
||||
In NumPy, the primary data structure is the ndarray (N-dimensional array). An array can have one or more dimensions, and it organizes your data efficiently.
|
||||
|
||||
Let us create a 2D array
|
||||
|
||||
``` python
|
||||
import numpy as np
|
||||
|
||||
numbers = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
|
||||
print(numbers)
|
||||
```
|
||||
|
||||
#### Output:
|
||||
|
||||
``` python
|
||||
array([[1, 2, 3, 4],[5, 6, 7, 8]])
|
||||
```
|
||||
|
||||
## Changing Array Shape using `reshape()`
|
||||
|
||||
The `reshape()` function allows you to rearrange the data within a NumPy array.
|
||||
|
||||
It take 2 arguments, row and columns. The `reshape()` can add or remove the dimensions. For instance, array can convert a 1D array into a 2D array or vice versa.
|
||||
|
||||
``` python
|
||||
arr_1d = np.array([1, 2, 3, 4, 5, 6]) # 1D array
|
||||
arr_2d = arr_1d.reshape(2, 3) # Reshaping with 2 rows and 3 cols
|
||||
|
||||
print(arr_2d)
|
||||
```
|
||||
|
||||
#### Output:
|
||||
|
||||
``` python
|
||||
array([[1, 2, 3],[4, 5, 6]])
|
||||
```
|
||||
|
||||
## Changing Array Shape using `resize()`
|
||||
|
||||
The `resize()` function allows you to modify the shape of a NumPy array directly.
|
||||
|
||||
It take 2 arguements, row and columns.
|
||||
|
||||
``` python
|
||||
import numpy as np
|
||||
arr_1d = np.array([1, 2, 3, 4, 5, 6])
|
||||
|
||||
arr_1d.resize((2, 3)) # 2 rows and 3 cols
|
||||
print(arr_1d)
|
||||
```
|
||||
|
||||
#### Output:
|
||||
|
||||
``` python
|
||||
array([[1, 2, 3],[4, 5, 6]])
|
||||
```
|
|
@ -0,0 +1,158 @@
|
|||
# Working with Date & Time in Pandas
|
||||
|
||||
While working with data, it is common to come across data containing date and time. Pandas is a very handy tool for dealing with such data and provides a wide range of date and time data processing options.
|
||||
|
||||
- **Parsing dates and times**: Pandas provides a number of functions for parsing dates and times from strings, including `to_datetime()` and `parse_dates()`. These functions can handle a variety of date and time formats, Unix timestamps, and human-readable formats.
|
||||
|
||||
- **Manipulating dates and times**: Pandas provides a number of functions for manipulating dates and times, including `shift()`, `resample()`, and `to_timedelta()`. These functions can be used to add or subtract time periods, change the frequency of a time series, and calculate the difference between two dates or times.
|
||||
|
||||
- **Visualizing dates and times**: Pandas provides a number of functions for visualizing dates and times, including `plot()`, `hist()`, and `bar()`. These functions can be used to create line charts, histograms, and bar charts of date and time data.
|
||||
|
||||
### `Timestamp` function
|
||||
|
||||
The timestamp function in Pandas is used to convert a datetime object to a Unix timestamp. A Unix timestamp is a numerical representation of datetime.
|
||||
|
||||
Example for retrieving day, month and year from given date:
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
ts = pd.Timestamp('2024-05-05')
|
||||
y = ts.year
|
||||
print('Year is: ', y)
|
||||
m = ts.month
|
||||
print('Month is: ', m)
|
||||
d = ts.day
|
||||
print('Day is: ', d)
|
||||
```
|
||||
|
||||
Output:
|
||||
|
||||
```python
|
||||
Year is: 2024
|
||||
Month is: 5
|
||||
Day is: 5
|
||||
```
|
||||
|
||||
Example for extracting time related data from given date:
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
ts = pd.Timestamp('2024-10-24 12:00:00')
|
||||
print('Hour is: ', ts.hour)
|
||||
print('Minute is: ', ts.minute)
|
||||
print('Weekday is: ', ts.weekday())
|
||||
print('Quarter is: ', ts.quarter)
|
||||
```
|
||||
|
||||
Output:
|
||||
|
||||
```python
|
||||
Hour is: 12
|
||||
Minute is: 0
|
||||
Weekday is: 1
|
||||
Quarter is: 4
|
||||
```
|
||||
|
||||
### `Timestamp.now()`
|
||||
|
||||
Example for getting current date and time:
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
ts = pd.Timestamp.now()
|
||||
print('Current date and time is: ', ts)
|
||||
```
|
||||
|
||||
Output:
|
||||
```python
|
||||
Current date and time is: 2024-05-25 11:48:25.593213
|
||||
```
|
||||
|
||||
### `date_range` function
|
||||
|
||||
Example for generating dates' for next five days:
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
ts = pd.date_range(start = pd.Timestamp.now(), periods = 5)
|
||||
for i in ts:
|
||||
print(i.date())
|
||||
```
|
||||
|
||||
Output:
|
||||
|
||||
```python
|
||||
2024-05-25
|
||||
2024-05-26
|
||||
2024-05-27
|
||||
2024-05-28
|
||||
2024-05-29
|
||||
```
|
||||
|
||||
Example for generating dates' for previous five days:
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
ts = pd.date_range(end = pd.Timestamp.now(), periods = 5)
|
||||
for i in ts:
|
||||
print(i.date())
|
||||
```
|
||||
|
||||
Output:
|
||||
```python
|
||||
2024-05-21
|
||||
2024-05-22
|
||||
2024-05-23
|
||||
2024-05-24
|
||||
2024-05-25
|
||||
```
|
||||
|
||||
### Built-in vs pandas date & time operations
|
||||
|
||||
In `pandas`, you may add a time delta to a full column of dates in a single action, but Python's datetime requires a loop.
|
||||
|
||||
Example in Pandas:
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
dates = pd.DataFrame(pd.date_range('2023-01-01', periods=100000, freq='T'))
|
||||
dates += pd.Timedelta(days=1)
|
||||
print(dates)
|
||||
```
|
||||
|
||||
Output:
|
||||
```python
|
||||
0
|
||||
0 2023-01-02 00:00:00
|
||||
1 2023-01-02 00:01:00
|
||||
2 2023-01-02 00:02:00
|
||||
3 2023-01-02 00:03:00
|
||||
4 2023-01-02 00:04:00
|
||||
... ...
|
||||
99995 2023-03-12 10:35:00
|
||||
99996 2023-03-12 10:36:00
|
||||
99997 2023-03-12 10:37:00
|
||||
99998 2023-03-12 10:38:00
|
||||
99999 2023-03-12 10:39:00
|
||||
```
|
||||
|
||||
Example using Built-in datetime library:
|
||||
|
||||
```python
|
||||
from datetime import datetime, timedelta
|
||||
|
||||
dates = [datetime(2023, 1, 1) + timedelta(minutes=i) for i in range(100000)]
|
||||
dates = [date + timedelta(days=1) for date in dates]
|
||||
```
|
||||
|
||||
Why use pandas functions?
|
||||
|
||||
- Pandas employs NumPy's datetime64 dtype, which takes up a set amount of bytes (usually 8 bytes per date), to store datetime data more compactly and efficiently.
|
||||
- Each datetime object in Python takes up extra memory since it contains not only the date and time but also the additional metadata and overhead associated with Python objects.
|
||||
- Pandas Offers a wide range of convenient functions and methods for date manipulation, extraction, and conversion, such as `pd.to_datetime()`, `date_range()`, `timedelta_range()`, and more. datetime library requires manual implementation for many of these operations, leading to longer and less efficient code.
|
|
@ -5,5 +5,6 @@
|
|||
- [Pandas Descriptive Statistics](Descriptive_Statistics.md)
|
||||
- [Group By Functions with Pandas](GroupBy_Functions_Pandas.md)
|
||||
- [Excel using Pandas DataFrame](excel_with_pandas.md)
|
||||
- [Working with Date & Time in Pandas](datetime.md)
|
||||
- [Importing and Exporting Data in Pandas](import-export.md)
|
||||
- [Handling Missing Values in Pandas](handling-missing-values.md)
|
||||
|
|
|
@ -0,0 +1,216 @@
|
|||
# Bar Plots in Matplotlib
|
||||
A bar plot or a bar chart is a type of data visualisation that represents data in the form of rectangular bars, with lengths or heights proportional to the values and data which they represent. The bar plots can be plotted both vertically and horizontally.
|
||||
|
||||
It is one of the most widely used type of data visualisation as it is easy to interpret and is pleasing to the eyes.
|
||||
|
||||
Matplotlib provides a very easy and intuitive method to create highly customized bar plots.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Before creating bar plots in matplotlib you must ensure that you have Python as well as Matplotlib installed on your system.
|
||||
|
||||
## Creating a simple Bar Plot with `bar()` method
|
||||
|
||||
A very basic Bar Plot can be created with `bar()` method in `matplotlib.pyplot`
|
||||
|
||||
```Python
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
# Creating dataset
|
||||
x = ["A", "B", "C", "D"]
|
||||
y = [2, 7, 9, 11]
|
||||
|
||||
# Creating bar plot
|
||||
plt.bar(x,y)
|
||||
plt.show() # Shows the plot
|
||||
```
|
||||
When executed, this would show the following bar plot:
|
||||
|
||||

|
||||
|
||||
The `bar()` function takes arguments that describes the layout of the bars.
|
||||
|
||||
Here, `plt.bar(x,y)` is used to specify that the bar chart is to be plotted by taking the `x` array as X-axis and `y` array as Y-axis. You can customize the graph further like adding labels to the axes, color of the bars, etc. These will be explored in the upcoming sections.
|
||||
|
||||
Additionally, you can also use `numpy` arrays for faster generation when handling large datasets.
|
||||
|
||||
```Python
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
|
||||
# Using numpy array
|
||||
x = np.array(["A", "B", "C", "D"])
|
||||
y = np.array([2, 7, 9, 11])
|
||||
|
||||
plt.bar(x,y)
|
||||
plt.show()
|
||||
```
|
||||
Its output would be the same as above.
|
||||
|
||||
## Customizing Bar Plots
|
||||
|
||||
For creating customized bar plots, it is **highly recommended** to create the plots using `matplotlib.pyplot.subplots()`, otherwise it is difficult to apply the customizations in the newer versions of Matplotlib.
|
||||
|
||||
### Adding title to the graph and labeling the axes
|
||||
|
||||
Let us create an imaginary graph of number of cars sold in a various years.
|
||||
|
||||
```Python
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
fig, ax = plt.subplots()
|
||||
|
||||
years = ['1999', '2000', '2001', '2002']
|
||||
num_of_cars_sold = [300, 500, 700, 1000]
|
||||
|
||||
# Creating bar plot
|
||||
ax.bar(years, num_of_cars_sold)
|
||||
|
||||
# Adding axis labels
|
||||
ax.set_xlabel("Years")
|
||||
ax.set_ylabel("Number of cars sold")
|
||||
|
||||
# Adding plot title
|
||||
ax.set_title("Number of cars sold in various years")
|
||||
|
||||
plt.show()
|
||||
```
|
||||
|
||||

|
||||
|
||||
Here, we have created a `matplotlib.pyplot.subplots()` object which returns a `Figure` object `fig` as well as an `Axes` object `ax` both of which are used for customizing the bar plot. `ax.set_xlabel`, `ax.set_ylabel` and `ax.set_title` are respectively used for adding labels of X, Y axis and adding title to the graph.
|
||||
|
||||
### Adding bar colors and legends
|
||||
|
||||
Let us consider our previous example of number of cars sold in various years and suppose that we want to add different colors to the bars from different centuries and respective legends for better interpretation.
|
||||
|
||||
This can be achieved by creating two separate arrays `bar_colors` for bar colors and `bar_labels` for legend labels and passing them as arguments to parameters color and label respectively in `ax.bar` method.
|
||||
|
||||
```Python
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
fig, ax = plt.subplots()
|
||||
|
||||
years = ['1998', '1999', '2000', '2001', '2002']
|
||||
num_of_cars_sold = [200, 300, 500, 700, 1000]
|
||||
bar_colors = ['tab:green', 'tab:green', 'tab:blue', 'tab:blue', 'tab:blue']
|
||||
bar_labels = ['1900s', '_1900s', '2000s', '_2000s', '_2000s']
|
||||
|
||||
# Creating the customized bar plot
|
||||
ax.bar(years, num_of_cars_sold, color=bar_colors, label=bar_labels)
|
||||
|
||||
# Adding axis labels
|
||||
ax.set_xlabel("Years")
|
||||
ax.set_ylabel("Number of cars sold")
|
||||
|
||||
# Adding plot title
|
||||
ax.set_title("Number of cars sold in various years")
|
||||
|
||||
# Adding legend title
|
||||
ax.legend(title='Centuries')
|
||||
|
||||
plt.show()
|
||||
```
|
||||
|
||||

|
||||
|
||||
Note that the labels with a preceding underscore won't show up in the legend. Legend titles can be added by simply passing `title` argument in `ax.legend()`, as shown. Also, you can have a different color for all the bars by passing the `HEX` value of that color in the `color` parameter.
|
||||
|
||||
### Adding labels to bars
|
||||
|
||||
We may want to add labels to bars representing their absolute (or truncated) values for instant and accurate reading. This can be achieved by passing the `BarContainer` object (returned by `ax.bar()` method) which is basically a aontainer with all the bars and optionally errorbars to `ax.bar_label` method.
|
||||
|
||||
```Python
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
fig, ax = plt.subplots()
|
||||
|
||||
years = ['1998', '1999', '2000', '2001', '2002']
|
||||
num_of_cars_sold = [200, 300, 500, 700, 1000]
|
||||
bar_colors = ['tab:green', 'tab:green', 'tab:blue', 'tab:blue', 'tab:blue']
|
||||
bar_labels = ['1900s', '_1900s', '2000s', '_2000s', '_2000s']
|
||||
|
||||
# BarContainer object
|
||||
bar_container = ax.bar(years, num_of_cars_sold, color=bar_colors, label=bar_labels)
|
||||
|
||||
ax.set_xlabel("Years")
|
||||
ax.set_ylabel("Number of cars sold")
|
||||
ax.set_title("Number of cars sold in various years")
|
||||
ax.legend(title='Centuries')
|
||||
|
||||
# Adding bar labels
|
||||
ax.bar_label(bar_container)
|
||||
|
||||
plt.show()
|
||||
```
|
||||
|
||||

|
||||
|
||||
**Note:** There are various other methods of adding bar labels in matplotlib.
|
||||
|
||||
## Horizontal Bar Plot
|
||||
|
||||
We can create horizontal bar plots by using the `barh()` method in `matplotlib.pyplot`. All the relevant customizations are applicable here also.
|
||||
|
||||
```Python
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
fig, ax = plt.subplots(figsize=(10,5)) # figsize is used to alter the size of figure
|
||||
|
||||
years = ['1998', '1999', '2000', '2001', '2002']
|
||||
num_of_cars_sold = [200, 300, 500, 700, 1000]
|
||||
bar_colors = ['tab:green', 'tab:green', 'tab:blue', 'tab:blue', 'tab:blue']
|
||||
bar_labels = ['1900s', '_1900s', '2000s', '_2000s', '_2000s']
|
||||
|
||||
# Creating horizontal bar plot
|
||||
bar_container = ax.barh(years, num_of_cars_sold, color=bar_colors, label=bar_labels)
|
||||
|
||||
# Adding axis labels
|
||||
ax.set_xlabel("Years")
|
||||
ax.set_ylabel("Number of cars sold")
|
||||
|
||||
# Adding Title
|
||||
ax.set_title("Number of cars sold in various years")
|
||||
ax.legend(title='Centuries')
|
||||
|
||||
# Adding bar labels
|
||||
ax.bar_label(bar_container)
|
||||
|
||||
plt.show()
|
||||
```
|
||||
|
||||

|
||||
|
||||
We can also invert the Y-axis labels here to show the top values first.
|
||||
|
||||
```Python
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
fig, ax = plt.subplots(figsize=(10,5)) # figsize is used to alter the size of figure
|
||||
|
||||
years = ['1998', '1999', '2000', '2001', '2002']
|
||||
num_of_cars_sold = [200, 300, 500, 700, 1000]
|
||||
bar_colors = ['tab:green', 'tab:green', 'tab:blue', 'tab:blue', 'tab:blue']
|
||||
bar_labels = ['1900s', '_1900s', '2000s', '_2000s', '_2000s']
|
||||
|
||||
# Creating horizontal bar plot
|
||||
bar_container = ax.barh(years, num_of_cars_sold, color=bar_colors, label=bar_labels)
|
||||
|
||||
# Adding axis labels
|
||||
ax.set_xlabel("Years")
|
||||
ax.set_ylabel("Number of cars sold")
|
||||
|
||||
# Adding Title
|
||||
ax.set_title("Number of cars sold in various years")
|
||||
ax.legend(title='Centuries')
|
||||
|
||||
# Adding bar labels
|
||||
ax.bar_label(bar_container)
|
||||
|
||||
# Inverting Y-axis
|
||||
ax.invert_yaxis()
|
||||
|
||||
plt.show()
|
||||
```
|
||||
|
||||

|
Po Szerokość: | Wysokość: | Rozmiar: 22 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 22 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 24 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 12 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 14 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 13 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 22 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 44 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 25 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 25 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 25 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 17 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 22 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 27 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 18 KiB |
|
@ -1,3 +1,5 @@
|
|||
# List of sections
|
||||
|
||||
- [Installing Matplotlib](matplotlib_installation.md)
|
||||
- [Installing Matplotlib](matplotlib-installation.md)
|
||||
- [Bar Plots in Matplotlib](matplotlib-bar-plots.md)
|
||||
- [Pie Charts in Matplotlib](matplotlib-pie-charts.md)
|
||||
|
|
|
@ -0,0 +1,233 @@
|
|||
# Pie Charts in Matplotlib
|
||||
|
||||
A pie chart is a type of graph that represents the data in the circular graph. The slices of pie show the relative size of the data, and it is a type of pictorial representation of data. A pie chart requires a list of categorical variables and numerical variables. Here, the term "pie" represents the whole, and the "slices" represent the parts of the whole.
|
||||
|
||||
Pie charts are commonly used in business presentations like sales, operations, survey results, resources, etc. as they are pleasing to the eye and provide a quick summary.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Before creating pie charts in matplotlib you must ensure that you have Python as well as Matplotlib installed on your system.
|
||||
|
||||
## Creating a simple pie chart with `pie()` method
|
||||
|
||||
A basic pie chart can be created with `pie()` method in `matplotlib.pyplot`.
|
||||
|
||||
```Python
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
# Creating dataset
|
||||
labels = ['A','B','C','D','E']
|
||||
data = [10,20,30,40,50]
|
||||
|
||||
# Creating Plot
|
||||
plt.pie(data, labels=labels)
|
||||
|
||||
# Show plot
|
||||
plt.show()
|
||||
```
|
||||
|
||||
When executed, this would show the following pie chart:
|
||||
|
||||

|
||||
|
||||
Note that the slices of the pie are labelled according to their corresponding proportion in the `data` as a whole.
|
||||
|
||||
The `pie()` function takes arguments that describes the layout of the pie chart.
|
||||
|
||||
Here, `plt.pie(data, labels=labels)` is used to specify that the pie chart is to be plotted by taking the values from array `data` and the fractional area of each slice is represented by **data/sum(data)**. The array `labels` represents the labels of slices corresponding to each value in `data`.
|
||||
|
||||
You can customize the graph further like specifying custom colors for slices, exploding slices, labeling wedges (slices), etc. These will be explored in the upcoming sections.
|
||||
|
||||
## Customizing Pie Chart in Matplotlib
|
||||
|
||||
For creating customized plots, it is highly recommended to create the plots using `matplotlib.pyplot.subplots()`, otherwise it is difficult to apply the customizations in the newer versions of Matplotlib.
|
||||
|
||||
### Coloring Slices
|
||||
|
||||
You can add custom set of colors to the slices by passing an array of colors to `colors` parameter in `pie()` method.
|
||||
|
||||
```Python
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
# Creating dataset
|
||||
labels = ['A','B','C','D','E']
|
||||
data = [10,20,30,40,50]
|
||||
colors = ['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:pink']
|
||||
|
||||
# Creating plot using matplotlib.pyplot.subplots()
|
||||
fig, ax = plt.subplots()
|
||||
ax.pie(data, labels=labels, colors=colors)
|
||||
|
||||
# Show plot
|
||||
plt.show()
|
||||
```
|
||||

|
||||
|
||||
Here, we have created a `matplotlib.pyplot.subplots()` object which returns a `Figure` object `fig` as well as an `Axes` object `ax` both of which are used for customizing the pie chart.
|
||||
|
||||
**Note:** Each slice of the pie chart is a `patches.Wedge` object; therefore in addition to the customizations shown here, each wedge can be customized using the `wedgeprops` argument which takes Python dictionary as parameter with name values pairs denoting the wedge properties like linewidth, edgecolor, etc.
|
||||
|
||||
### Hatching Slices
|
||||
|
||||
To make the pie chart more pleasing, you can pass a list of hatch patters to `hatch` parameter to set the pattern of each slice.
|
||||
|
||||
```Python
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
# Creating dataset
|
||||
labels = ['A','B','C','D','E']
|
||||
data = [10,20,30,40,50]
|
||||
colors = ['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:pink']
|
||||
hatch = ['*O', 'oO', 'OO', '.||.', '|*|'] # Hatch patterns
|
||||
|
||||
# Creating plot
|
||||
fig, ax = plt.subplots()
|
||||
ax.pie(data, labels=labels, colors=colors, hatch=hatch)
|
||||
|
||||
# Show plot
|
||||
plt.show()
|
||||
```
|
||||

|
||||
|
||||
You can try and test your own beautiful hatch patters!
|
||||
|
||||
### Labeling Slices
|
||||
|
||||
You can pass a function or format string to `autopct` parameter to label slices.
|
||||
|
||||
An example in shown here:
|
||||
|
||||
```Python
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
# Creating dataset
|
||||
labels = ['Rose','Tulip','Marigold','Sunflower','Daffodil']
|
||||
data = [11,9,17,4,7]
|
||||
colors=['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:pink']
|
||||
|
||||
# Creating plot
|
||||
fig, ax = plt.subplots()
|
||||
ax.pie(data, labels=labels, colors=colors, autopct='%1.1f%%')
|
||||
|
||||
# Show plot
|
||||
plt.show()
|
||||
```
|
||||

|
||||
|
||||
Here, `autopct='%1.1f%%'` specifies that the wedges (slices) have to be labelled corresponding to the percentage proportion which they occupy out of 100% with precision upto 1 decimal places.
|
||||
|
||||
### Exploding Slices
|
||||
|
||||
The explode parameter separates a portion of the chart. You can explode slices by passing an array of numbers to `explode` parameter.
|
||||
|
||||
```Python
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
# Creating dataset
|
||||
labels = ['Rose','Tulip','Marigold','Sunflower','Daffodil']
|
||||
data = [11,9,17,4,7]
|
||||
colors=['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:pink']
|
||||
|
||||
# Explode only the first slice, i.e 'Rose'
|
||||
explode = [0.1, 0, 0, 0, 0]
|
||||
|
||||
# Creating plot
|
||||
fig, ax = plt.subplots()
|
||||
ax.pie(data, labels=labels, colors=colors, explode=explode, autopct='%1.1f%%')
|
||||
|
||||
# Show plot
|
||||
plt.show()
|
||||
```
|
||||

|
||||
|
||||
### Shading Slices
|
||||
|
||||
You can add shadow to slices by passing `shadow=True` in `pie()` method.
|
||||
|
||||
```Python
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
# Creating dataset
|
||||
labels = ['Rose','Tulip','Marigold','Sunflower','Daffodil']
|
||||
data = [11,9,17,4,7]
|
||||
colors=['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:pink']
|
||||
|
||||
# Explode only the first slice, i.e 'Rose'
|
||||
explode = [0.1, 0, 0, 0, 0]
|
||||
|
||||
# Creating plot
|
||||
fig, ax = plt.subplots()
|
||||
ax.pie(data, labels=labels, colors=colors, explode=explode, shadow=True, autopct='%1.1f%%')
|
||||
|
||||
# Show plot
|
||||
plt.show()
|
||||
```
|
||||

|
||||
|
||||
### Rotating Slices
|
||||
|
||||
You can rotate slices by passing a custom start angle value to the `startangle` parameter.
|
||||
|
||||
```Python
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
# Creating dataset
|
||||
labels = ['Rose','Tulip','Marigold','Sunflower','Daffodil']
|
||||
data = [11,9,17,4,7]
|
||||
colors=['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:pink']
|
||||
|
||||
# Creating plot
|
||||
fig, ax = plt.subplots()
|
||||
ax.pie(data, labels=labels, colors=colors, startangle=90, autopct='%1.1f%%')
|
||||
|
||||
# Show plot
|
||||
plt.show()
|
||||
```
|
||||

|
||||
|
||||
The default `startangle` is 0, which would start the first slice ('Rose') on the positive x-axis. This example sets `startangle=90` such that all the slices are rotated counter-clockwise by 90 degrees, and the `'Rose'` slice starts on the positive y-axis.
|
||||
|
||||
### Controlling Size of Pie Chart
|
||||
|
||||
In addition to the size of figure, you can also control the size of pie chart using the `radius` parameter.
|
||||
|
||||
```Python
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
# Creating dataset
|
||||
labels = ['Rose','Tulip','Marigold','Sunflower','Daffodil']
|
||||
data = [11,9,17,4,7]
|
||||
colors=['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:pink']
|
||||
|
||||
# Creating plot
|
||||
fig, ax = plt.subplots()
|
||||
ax.pie(data, labels=labels, colors=colors, startangle=90, autopct='%1.1f%%', textprops={'size': 'smaller'}, radius=0.7)
|
||||
|
||||
# Show plot
|
||||
plt.show()
|
||||
```
|
||||

|
||||
|
||||
Note that `textprops` is an additional argument which can be used for controlling the propoerties of any text in the pie chart. In this case, we have specified that the size of text should be smaller. There are many more such properties available in `textprops`.
|
||||
|
||||
### Adding Legends
|
||||
|
||||
You can also use legends to act like a label to slices, like this:
|
||||
|
||||
```Python
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
# Creating dataset
|
||||
labels = ['Rose','Tulip','Marigold','Sunflower','Daffodil']
|
||||
data = [11,9,17,4,7]
|
||||
colors=['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:pink']
|
||||
|
||||
# Creating plot
|
||||
fig, ax = plt.subplots(figsize=(7,7))
|
||||
ax.pie(data, colors=colors, startangle=90, autopct='%1.1f%%', radius=0.7)
|
||||
plt.legend(labels, title="Flowers")
|
||||
|
||||
# Show plot
|
||||
plt.show()
|
||||
```
|
||||

|