Merge branch 'main' into main
|
@ -0,0 +1,110 @@
|
|||
## Asynchronous Context Managers and Generators in Python
|
||||
Asynchronous programming in Python allows for more efficient use of resources by enabling tasks to run concurrently. Python provides support for asynchronous
|
||||
context managers and generators, which help manage resources and perform operations asynchronously.
|
||||
|
||||
### Asynchronous Context Managers
|
||||
Asynchronous context managers are similar to regular context managers but are designed to work with asynchronous code. They use the async with statement and
|
||||
typically include the '__aenter__' and '__aexit__' methods.
|
||||
|
||||
### Creating an Asynchronous Context Manager
|
||||
Here's a simple example of an asynchronous context manager:
|
||||
|
||||
```bash
|
||||
import asyncio
|
||||
|
||||
class AsyncContextManager:
|
||||
async def __aenter__(self):
|
||||
print("Entering context")
|
||||
await asyncio.sleep(1) # Simulate an async operation
|
||||
return self
|
||||
|
||||
async def __aexit__(self, exc_type, exc, tb):
|
||||
print("Exiting context")
|
||||
await asyncio.sleep(1) # Simulate cleanup
|
||||
|
||||
async def main():
|
||||
async with AsyncContextManager() as acm:
|
||||
print("Inside context")
|
||||
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
Output:
|
||||
|
||||
```bash
|
||||
Entering context
|
||||
Inside context
|
||||
Exiting context
|
||||
```
|
||||
|
||||
### Asynchronous Generators
|
||||
Asynchronous generators allow you to yield values within an asynchronous function. They use the async def syntax along with the yield statement and are
|
||||
iterated using the async for loop.
|
||||
|
||||
### Creating an Asynchronous Generator
|
||||
Here's a basic example of an asynchronous generator:
|
||||
|
||||
```bash
|
||||
import asyncio
|
||||
|
||||
async def async_generator():
|
||||
for i in range(5):
|
||||
await asyncio.sleep(1) # Simulate an async operation
|
||||
yield i
|
||||
|
||||
async def main():
|
||||
async for value in async_generator():
|
||||
print(value)
|
||||
|
||||
asyncio.run(main())
|
||||
```
|
||||
Output:
|
||||
```bash
|
||||
0
|
||||
1
|
||||
2
|
||||
3
|
||||
4
|
||||
```
|
||||
### Combining Asynchronous Context Managers and Generators
|
||||
You can combine asynchronous context managers and generators to create more complex and efficient asynchronous workflows.
|
||||
Example: Fetching Data with an Async Context Manager and Generator
|
||||
Consider a scenario where you need to fetch data from an API asynchronously and manage the connection using an asynchronous context manager:
|
||||
```bash
|
||||
import aiohttp
|
||||
import asyncio
|
||||
|
||||
class AsyncHTTPClient:
|
||||
def __init__(self, url):
|
||||
self.url = url
|
||||
|
||||
async def __aenter__(self):
|
||||
self.session = aiohttp.ClientSession()
|
||||
self.response = await self.session.get(self.url)
|
||||
return self.response
|
||||
|
||||
async def __aexit__(self, exc_type, exc, tb):
|
||||
await self.response.release()
|
||||
await self.session.close()
|
||||
|
||||
async def async_fetch(urls):
|
||||
for url in urls:
|
||||
async with AsyncHTTPClient(url) as response:
|
||||
data = await response.text()
|
||||
yield data
|
||||
|
||||
async def main():
|
||||
urls = ["http://example.com", "http://example.org", "http://example.net"]
|
||||
async for data in async_fetch(urls):
|
||||
print(data)
|
||||
|
||||
asyncio.run(main())
|
||||
```
|
||||
### Benefits of Asynchronous Context Managers and Generators
|
||||
1. Efficient Resource Management: They help manage resources like network connections or file handles more efficiently by releasing them as soon as they are no longer needed.
|
||||
2. Concurrency: They enable concurrent operations, improving performance in I/O-bound tasks such as network requests or file I/O.
|
||||
3. Readability and Maintainability: They provide a clear and structured way to handle asynchronous operations, making the code easier to read and maintain.
|
||||
### Summary
|
||||
Asynchronous context managers and generators are powerful tools in Python that enhance the efficiency and readability
|
||||
of asynchronous code. By using 'async with' for resource management and 'async for' for iteration, you can write more performant and maintainable asynchronous
|
||||
programs.
|
|
@ -0,0 +1,101 @@
|
|||
# Closures
|
||||
In order to have complete understanding of this topic in python, one needs to be crystal clear with the concept of functions and the different types of them which are namely First Class Functions and Nested Functions.
|
||||
|
||||
### First Class Functions
|
||||
These are the normal functions used by the programmer in routine as they can be assigned to variables, passed as arguments and returned from other functions.
|
||||
### Nested Functions
|
||||
These are the functions defined within other functions and involve thorough usage of **Closures**. It is also referred as **Inner Functions** by some books. There are times when it is required to prevent a function or the data it has access to from being accessed from other parts of the code, and this is where Nested Functions come into play. Basically, its usage allows the encapsulation of that particular data/function within another function. This enables it to be virtually hidden from the global scope.
|
||||
|
||||
## Defining Closures
|
||||
In nested functions, if the outer function basically ends up returning the inner function, in this case the concept of closures comes into play.
|
||||
|
||||
A closure is a function object that remembers values in enclosing scopes even if they are not present in memory. There are certain neccesary condtions required to create a closure in python :
|
||||
1. The inner function must be defined inside the outer function.
|
||||
2. The inner function must refer to a value defined in the outer function.
|
||||
3. The inner function must return a value.
|
||||
|
||||
## Advantages of Closures
|
||||
* Closures make it possible to pass data to inner functions without first passing them to outer functions
|
||||
* Closures can be used to create private variables and functions
|
||||
* They also make it possible to invoke the inner function from outside of the encapsulating outer function.
|
||||
* It improves code readability and maintainability
|
||||
|
||||
## Examples implementing Closures
|
||||
### Example 1 : Basic Implementation
|
||||
```python
|
||||
def make_multiplier_of(n):
|
||||
def multiplier(x):
|
||||
return x * n
|
||||
return multiplier
|
||||
|
||||
times3 = make_multiplier_of(3)
|
||||
times5 = make_multiplier_of(5)
|
||||
|
||||
print(times3(9))
|
||||
print(times5(3))
|
||||
```
|
||||
#### Output:
|
||||
```
|
||||
27
|
||||
15
|
||||
```
|
||||
The **multiplier function** is defined inside the **make_multiplier_of function**. It has access to the n variable from the outer scope, even after the make_multiplier_of function has returned. This is an example of a closure.
|
||||
|
||||
### Example 2 : Implementation with Decorators
|
||||
```python
|
||||
def decorator_function(original_function):
|
||||
def wrapper_function(*args, **kwargs):
|
||||
print(f"Wrapper executed before {original_function.__name__}")
|
||||
return original_function(*args, **kwargs)
|
||||
return wrapper_function
|
||||
|
||||
@decorator_function
|
||||
def display():
|
||||
print("Display function executed")
|
||||
|
||||
display()
|
||||
```
|
||||
#### Output:
|
||||
```
|
||||
Wrapper executed before display
|
||||
Display function executed
|
||||
```
|
||||
The code in the example defines a decorator function: ***decorator_function*** that takes a function as an argument and returns a new function **wrapper_function**. The **wrapper_function** function prints a message to the console before calling the original function which appends the name of the called function as specified in the code.
|
||||
|
||||
The **@decorator_function** syntax is used to apply the decorator_function decorator to the display function. This means that the display function is replaced with the result of calling **decorator_function(display)**.
|
||||
|
||||
When the **display()** function is called, the wrapper_function function is executed instead. The wrapper_function function prints a message to the console and then calls the original display function.
|
||||
### Example 3 : Implementation with for loop
|
||||
```python
|
||||
def create_closures():
|
||||
closures = []
|
||||
for i in range(5):
|
||||
def closure(i=i): # Capture current value of i by default argument
|
||||
return i
|
||||
closures.append(closure)
|
||||
return closures
|
||||
|
||||
my_closures = create_closures()
|
||||
for closure in my_closures:
|
||||
print(closure())
|
||||
|
||||
```
|
||||
#### Output:
|
||||
```
|
||||
0
|
||||
1
|
||||
2
|
||||
3
|
||||
4
|
||||
```
|
||||
The code in the example defines a function **create_closures** that creates a list of closure functions. Each closure function returns the current value of the loop variable i.
|
||||
|
||||
The closure function is defined inside the **create_closures function**. It has access to the i variable from the **outer scope**, even after the create_closures function has returned. This is an example of a closure.
|
||||
|
||||
The **i**=*i* argument in the closure function is used to capture the current value of *i* by default argument. This is necessary because the ****i** variable in the outer scope is a loop variable, and its value changes in each iteration of the loop. By capturing the current value of *i* in the default argument, we ensure that each closure function returns the correct value of **i**. This is responsible for the generation of output 0,1,2,3,4.
|
||||
|
||||
|
||||
For more examples related to closures, [click here](https://dev.to/bshadmehr/understanding-closures-in-python-a-comprehensive-tutorial-11ld).
|
||||
|
||||
## Summary
|
||||
Closures in Python provide a powerful mechanism for encapsulating state and behavior, enabling more flexible and modular code. Understanding and effectively using closures enables the creation of function factories, allows functions to have state, and facilitates functional programming techniques
|
|
@ -0,0 +1,75 @@
|
|||
# Understanding the `eval` Function in Python
|
||||
## Introduction
|
||||
|
||||
The `eval` function in Python allows you to execute a string-based Python expression dynamically. This can be useful in various scenarios where you need to evaluate expressions that are not known until runtime.
|
||||
|
||||
## Syntax
|
||||
```python
|
||||
eval(expression, globals=None, locals=None)
|
||||
```
|
||||
|
||||
### Parameters:
|
||||
|
||||
* expression: String is parsed and evaluated as a Python expression
|
||||
* globals [optional]: Dictionary to specify the available global methods and variables.
|
||||
* locals [optional]: Another dictionary to specify the available local methods and variables.
|
||||
|
||||
## Examples
|
||||
Example 1:
|
||||
```python
|
||||
result = eval('2 + 3 * 4')
|
||||
print(result) # Output: 14
|
||||
```
|
||||
Example 2:
|
||||
|
||||
```python
|
||||
x = 10
|
||||
expression = 'x * 2'
|
||||
result = eval(expression, {'x': x})
|
||||
print(result) # Output: 20
|
||||
```
|
||||
Example 3:
|
||||
```python
|
||||
x = 10
|
||||
def multiply(a, b):
|
||||
return a * b
|
||||
expression = 'multiply(x, 5) + 2'
|
||||
result = eval(expression)
|
||||
print("Result:",result) # Output: Result:52
|
||||
```
|
||||
Example 4:
|
||||
```python
|
||||
expression = input("Enter a Python expression: ")
|
||||
result = eval(expression)
|
||||
print("Result:", result)
|
||||
#input= "3+2"
|
||||
#Output: Result:5
|
||||
```
|
||||
|
||||
Example 5:
|
||||
```python
|
||||
import numpy as np
|
||||
a=np.random.randint(1,9)
|
||||
b=np.random.randint(1,9)
|
||||
operations=["*","-","+"]
|
||||
op=np.random.choice(operations)
|
||||
|
||||
expression=str(a)+op+str(b)
|
||||
correct_answer=eval(expression)
|
||||
given_answer=int(input(str(a)+" "+op+" "+str(b)+" = "))
|
||||
|
||||
if given_answer==correct_answer:
|
||||
print("Correct")
|
||||
else:
|
||||
print("Incorrect")
|
||||
print("correct answer is :" ,correct_answer)
|
||||
|
||||
#2 * 1 = 8
|
||||
#Incorrect
|
||||
#correct answer is : 2
|
||||
#or
|
||||
#3 * 2 = 6
|
||||
#Correct
|
||||
```
|
||||
## Conclusion
|
||||
The eval function is a powerful tool in Python that allows for dynamic evaluation of expressions.
|
|
@ -0,0 +1,86 @@
|
|||
# Filter Function
|
||||
|
||||
## Definition
|
||||
The filter function is a built-in Python function used for constructing an iterator from elements of an iterable for which a function returns true.
|
||||
|
||||
**Syntax**:
|
||||
```python
|
||||
filter(function, iterable)
|
||||
```
|
||||
**Parameters**:<br>
|
||||
*function*: A function that tests if each element of an iterable returns True or False.<br>
|
||||
*iterable*: An iterable like sets, lists, tuples, etc., whose elements are to be filtered.<br>
|
||||
*Returns* : An iterator that is already filtered.
|
||||
|
||||
## Basic Usage
|
||||
**Example 1: Filtering a List of Numbers**:
|
||||
```python
|
||||
# Define a function that returns True for even numbers
|
||||
def is_even(n):
|
||||
return n % 2 == 0
|
||||
|
||||
numbers = [1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
|
||||
even_numbers = filter(is_even, numbers)
|
||||
|
||||
# Convert the filter object to a list
|
||||
print(list(even_numbers)) # Output: [2, 4, 6, 8, 10]
|
||||
```
|
||||
|
||||
**Example 2: Filtering with a Lambda Function**:
|
||||
```python
|
||||
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
|
||||
odd_numbers = filter(lambda x: x % 2 != 0, numbers)
|
||||
|
||||
print(list(odd_numbers)) # Output: [1, 3, 5, 7, 9]
|
||||
```
|
||||
|
||||
**Example 3: Filtering Strings**:
|
||||
```python
|
||||
words = ["apple", "banana", "cherry", "date", "elderberry", "fig", "grape" , "python"]
|
||||
long_words = filter(lambda word: len(word) > 5, words)
|
||||
|
||||
print(list(long_words)) # Output: ['banana', 'cherry', 'elderberry', 'python']
|
||||
```
|
||||
|
||||
## Advanced Usage
|
||||
**Example 4: Filtering Objects with Attributes**:
|
||||
```python
|
||||
class Person:
|
||||
def __init__(self, name, age):
|
||||
self.name = name
|
||||
self.age = age
|
||||
|
||||
people = [
|
||||
Person("Alice", 30),
|
||||
Person("Bob", 15),
|
||||
Person("Charlie", 25),
|
||||
Person("David", 35)
|
||||
]
|
||||
|
||||
adults = filter(lambda person: person.age >= 18, people)
|
||||
adult_names = map(lambda person: person.name, adults)
|
||||
|
||||
print(list(adult_names)) # Output: ['Alice', 'Charlie', 'David']
|
||||
```
|
||||
|
||||
**Example 5: Using None as the Function**:
|
||||
```python
|
||||
numbers = [0, 1, 2, 3, 0, 4, 0, 5]
|
||||
non_zero_numbers = filter(None, numbers)
|
||||
|
||||
print(list(non_zero_numbers)) # Output: [1, 2, 3, 4, 5]
|
||||
```
|
||||
**NOTE**: When None is passed as the function, filter removes all items that are false.
|
||||
|
||||
## Time Complexity:
|
||||
- The time complexity of filter() depends on two factors:
|
||||
1. The time complexity of the filtering function (the one you provide as an argument).
|
||||
2. The size of the iterable being filtered.
|
||||
- If the filtering function has a constant time complexity (e.g., O(1)), the overall time complexity of filter() is linear (O(n)), where ‘n’ is the number of elements in the iterable.
|
||||
|
||||
## Space Complexity:
|
||||
- The space complexity of filter() is also influenced by the filtering function and the size of the iterable.
|
||||
- Since filter() returns an iterator, it doesn’t create a new list in memory. Instead, it generates filtered elements on-the-fly as you iterate over it. Therefore, the space complexity is O(1).
|
||||
|
||||
## Conclusion:
|
||||
Python’s filter() allows you to perform filtering operations on iterables. This kind of operation consists of applying a Boolean function to the items in an iterable and keeping only those values for which the function returns a true result. In general, you can use filter() to process existing iterables and produce new iterables containing the values that you currently need.Both versions of Python support filter(), but Python 3’s approach is more memory-efficient due to the use of iterators.
|
|
@ -2,6 +2,8 @@
|
|||
|
||||
- [OOPs](oops.md)
|
||||
- [Decorators/\*args/**kwargs](decorator-kwargs-args.md)
|
||||
- ['itertools' module](itertools.md)
|
||||
- [Type Hinting](type-hinting.md)
|
||||
- [Lambda Function](lambda-function.md)
|
||||
- [Working with Dates & Times in Python](dates_and_times.md)
|
||||
- [Regular Expressions in Python](regular_expressions.md)
|
||||
|
@ -10,4 +12,12 @@
|
|||
- [Protocols](protocols.md)
|
||||
- [Exception Handling in Python](exception-handling.md)
|
||||
- [Generators](generators.md)
|
||||
- [Match Case Statement](match-case.md)
|
||||
- [Closures](closures.md)
|
||||
- [Filter](filter-function.md)
|
||||
- [Reduce](reduce-function.md)
|
||||
- [List Comprehension](list-comprehension.md)
|
||||
- [Eval Function](eval_function.md)
|
||||
- [Magic Methods](magic-methods.md)
|
||||
- [Asynchronous Context Managers & Generators](asynchronous-context-managers-generators.md)
|
||||
- [Threading](threading.md)
|
||||
|
|
|
@ -0,0 +1,144 @@
|
|||
# The 'itertools' Module in Python
|
||||
The itertools module in Python provides a collection of fast, memory-efficient tools that are useful for creating and working with iterators. These functions
|
||||
allow you to iterate over data in various ways, often combining, filtering, or extending iterators to generate complex sequences efficiently.
|
||||
|
||||
## Benefits of itertools
|
||||
1. Efficiency: Functions in itertools are designed to be memory-efficient, often generating elements on the fly and avoiding the need to store large intermediate results.
|
||||
2. Conciseness: Using itertools can lead to more readable and concise code, reducing the need for complex loops and temporary variables.
|
||||
3. Composability: Functions from itertools can be easily combined, allowing you to build complex iterator pipelines from simple building blocks.
|
||||
|
||||
## Useful Functions in itertools <br>
|
||||
Here are some of the most useful functions in the itertools module, along with examples of how to use them:
|
||||
|
||||
1. 'count': Generates an infinite sequence of numbers, starting from a specified value.
|
||||
|
||||
```bash
|
||||
import itertools
|
||||
|
||||
counter = itertools.count(start=10, step=2)
|
||||
for _ in range(5):
|
||||
print(next(counter))
|
||||
# Output: 10, 12, 14, 16, 18
|
||||
```
|
||||
|
||||
2. 'cycle': Cycles through an iterable indefinitely.
|
||||
|
||||
```bash
|
||||
import itertools
|
||||
|
||||
cycler = itertools.cycle(['A', 'B', 'C'])
|
||||
for _ in range(6):
|
||||
print(next(cycler))
|
||||
# Output: A, B, C, A, B, C
|
||||
```
|
||||
|
||||
3.'repeat': Repeats an object a specified number of times or indefinitely.
|
||||
|
||||
```bash
|
||||
import itertools
|
||||
|
||||
repeater = itertools.repeat('Hello', 3)
|
||||
for item in repeater:
|
||||
print(item)
|
||||
# Output: Hello, Hello, Hello
|
||||
```
|
||||
|
||||
4. 'chain': Combines multiple iterables into a single iterable.
|
||||
|
||||
```bash
|
||||
import itertools
|
||||
|
||||
combined = itertools.chain([1, 2, 3], ['a', 'b', 'c'])
|
||||
for item in combined:
|
||||
print(item)
|
||||
# Output: 1, 2, 3, a, b, c
|
||||
```
|
||||
|
||||
5. 'islice': Slices an iterator, similar to slicing a list.
|
||||
|
||||
```bash
|
||||
import itertools
|
||||
|
||||
sliced = itertools.islice(range(10), 2, 8, 2)
|
||||
for item in sliced:
|
||||
print(item)
|
||||
# Output: 2, 4, 6
|
||||
```
|
||||
|
||||
6. 'compress': Filters elements in an iterable based on a corresponding selector iterable.
|
||||
|
||||
```bash
|
||||
import itertools
|
||||
|
||||
data = ['A', 'B', 'C', 'D']
|
||||
selectors = [1, 0, 1, 0]
|
||||
result = itertools.compress(data, selectors)
|
||||
for item in result:
|
||||
print(item)
|
||||
# Output: A, C
|
||||
```
|
||||
|
||||
7. 'permutations': Generates all possible permutations of an iterable.
|
||||
|
||||
```bash
|
||||
import itertools
|
||||
|
||||
perms = itertools.permutations('ABC', 2)
|
||||
for item in perms:
|
||||
print(item)
|
||||
# Output: ('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'C'), ('C', 'A'), ('C', 'B')
|
||||
```
|
||||
|
||||
8. 'combinations': Generates all possible combinations of a specified length from an iterable.
|
||||
|
||||
```bash
|
||||
import itertools
|
||||
|
||||
combs = itertools.combinations('ABC', 2)
|
||||
for item in combs:
|
||||
print(item)
|
||||
# Output: ('A', 'B'), ('A', 'C'), ('B', 'C')
|
||||
```
|
||||
|
||||
9. 'product': Computes the Cartesian product of input iterables.
|
||||
|
||||
```bash
|
||||
import itertools
|
||||
|
||||
prod = itertools.product('AB', '12')
|
||||
for item in prod:
|
||||
print(item)
|
||||
# Output: ('A', '1'), ('A', '2'), ('B', '1'), ('B', '2')
|
||||
```
|
||||
|
||||
10. 'groupby': Groups elements of an iterable by a specified key function.
|
||||
|
||||
```bash
|
||||
import itertools
|
||||
|
||||
data = [{'name': 'Alice', 'age': 25}, {'name': 'Bob', 'age': 25}, {'name': 'Charlie', 'age': 30}]
|
||||
sorted_data = sorted(data, key=lambda x: x['age'])
|
||||
grouped = itertools.groupby(sorted_data, key=lambda x: x['age'])
|
||||
for key, group in grouped:
|
||||
print(key, list(group))
|
||||
# Output:
|
||||
# 25 [{'name': 'Alice', 'age': 25}, {'name': 'Bob', 'age': 25}]
|
||||
# 30 [{'name': 'Charlie', 'age': 30}]
|
||||
```
|
||||
|
||||
11. 'accumulate': Makes an iterator that returns accumulated sums, or accumulated results of other binary functions specified via the optional func argument.
|
||||
|
||||
```bash
|
||||
import itertools
|
||||
import operator
|
||||
|
||||
data = [1, 2, 3, 4, 5]
|
||||
acc = itertools.accumulate(data, operator.mul)
|
||||
for item in acc:
|
||||
print(item)
|
||||
# Output: 1, 2, 6, 24, 120
|
||||
```
|
||||
|
||||
## Conclusion
|
||||
The itertools module is a powerful toolkit for working with iterators in Python. Its functions enable efficient and concise handling of iterable data, allowing you to create complex data processing pipelines with minimal memory overhead.
|
||||
By leveraging itertools, you can improve the readability and performance of your code, making it a valuable addition to your Python programming arsenal.
|
|
@ -0,0 +1,73 @@
|
|||
# List Comprehension
|
||||
|
||||
Creating lists concisely and expressively is what list comprehension in Python does. You can generate lists from already existing iterables like lists, tuples or strings with a short form.
|
||||
This boosts the readability of code and reduces necessity of using explicit looping constructs.
|
||||
|
||||
## Syntax :
|
||||
|
||||
### Basic syntax
|
||||
|
||||
```python
|
||||
new_list = [expression for item in iterable]
|
||||
```
|
||||
- **new_list**: This is the name given to the list that will be created using the list comprehension.
|
||||
- **expression**: This is the expression that defines how each element of the new list will be generated or transformed.
|
||||
- **item**: This variable represents each individual element from the iterable. It takes on the value of each element in the iterable during each iteration.
|
||||
- **iterable**: This is the sequence-like object over which the iteration will take place. It provides the elements that will be processed by the expression.
|
||||
|
||||
This list comprehension syntax `[expression for item in iterable]` allows you to generate a new list by applying a specific expression to each element in an iterable.
|
||||
|
||||
### Syntax including condition
|
||||
|
||||
```python
|
||||
new_list = [expression for item in iterable if condition]
|
||||
```
|
||||
- **new_list**: This is the name given to the list that will be created using the list comprehension.
|
||||
- **expression**: This is the expression that defines how each element of the new list will be generated or transformed.
|
||||
- **item**: This variable represents each individual element from the iterable. It takes on the value of each element in the iterable during each iteration.
|
||||
- **iterable**: This is the sequence-like object over which the iteration will take place. It provides the elements that will be processed by the expression.
|
||||
- **if condition**: This is an optional part of the syntax. It allows for conditional filtering of elements from the iterable. Only items that satisfy the condition
|
||||
will be included in the new list.
|
||||
|
||||
|
||||
## Examples:
|
||||
|
||||
1. Generating a list of squares of numbers from 1 to 5:
|
||||
|
||||
```python
|
||||
squares = [x ** 2 for x in range(1, 6)]
|
||||
print(squares)
|
||||
```
|
||||
|
||||
- **Output** :
|
||||
```python
|
||||
[1, 4, 9, 16, 25]
|
||||
```
|
||||
|
||||
2. Filtering even numbers from a list:
|
||||
|
||||
```python
|
||||
nums = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
|
||||
even = [x for x in nums if x % 2 == 0]
|
||||
print(even)
|
||||
```
|
||||
|
||||
- **Output** :
|
||||
```python
|
||||
[2, 4, 6, 8, 10]
|
||||
```
|
||||
|
||||
3. Flattening a list of lists:
|
||||
```python
|
||||
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
|
||||
flat = [x for sublist in matrix for x in sublist]
|
||||
print(flat)
|
||||
```
|
||||
|
||||
- **Output** :
|
||||
```python
|
||||
[1, 2, 3, 4, 5, 6, 7, 8, 9]
|
||||
```
|
||||
|
||||
List comprehension is a powerful feature in Python for creating lists based on existing iterables with a concise syntax.
|
||||
By mastering list comprehension, developers can write cleaner, more expressive code and leverage Python's functional programming capabilities effectively.
|
|
@ -0,0 +1,151 @@
|
|||
# Magic Methods
|
||||
|
||||
Magic methods, also known as dunder (double underscore) methods, are special methods in Python that start and end with double underscores (`__`).
|
||||
These methods allow you to define the behavior of objects for built-in operations and functions, enabling you to customize how your objects interact with the
|
||||
language's syntax and built-in features. Magic methods make your custom classes integrate seamlessly with Python’s built-in data types and operations.
|
||||
|
||||
**Commonly Used Magic Methods**
|
||||
|
||||
1. **Initialization and Representation**
|
||||
- `__init__(self, ...)`: Called when an instance of the class is created. Used for initializing the object's attributes.
|
||||
- `__repr__(self)`: Returns a string representation of the object, useful for debugging and logging.
|
||||
- `__str__(self)`: Returns a human-readable string representation of the object.
|
||||
|
||||
**Example** :
|
||||
|
||||
```python
|
||||
class Person:
|
||||
def __init__(self, name, age):
|
||||
self.name = name
|
||||
self.age = age
|
||||
|
||||
def __repr__(self):
|
||||
return f"Person({self.name}, {self.age})"
|
||||
|
||||
def __str__(self):
|
||||
return f"{self.name}, {self.age} years old"
|
||||
|
||||
p = Person("Alice", 30)
|
||||
print(repr(p))
|
||||
print(str(p))
|
||||
```
|
||||
|
||||
**Output** :
|
||||
```python
|
||||
Person("Alice",30)
|
||||
Alice, 30 years old
|
||||
```
|
||||
|
||||
2. **Arithmetic Operations**
|
||||
- `__add__(self, other)`: Defines behavior for the `+` operator.
|
||||
- `__sub__(self, other)`: Defines behavior for the `-` operator.
|
||||
- `__mul__(self, other)`: Defines behavior for the `*` operator.
|
||||
- `__truediv__(self, other)`: Defines behavior for the `/` operator.
|
||||
|
||||
|
||||
**Example** :
|
||||
|
||||
```python
|
||||
class Vector:
|
||||
def __init__(self, x, y):
|
||||
self.x = x
|
||||
self.y = y
|
||||
|
||||
def __add__(self, other):
|
||||
return Vector(self.x + other.x, self.y + other.y)
|
||||
|
||||
def __repr__(self):
|
||||
return f"Vector({self.x}, {self.y})"
|
||||
|
||||
v1 = Vector(2, 3)
|
||||
v2 = Vector(1, 1)
|
||||
v3 = v1 + v2
|
||||
print(v3)
|
||||
```
|
||||
|
||||
**Output** :
|
||||
|
||||
```python
|
||||
Vector(3, 4)
|
||||
```
|
||||
|
||||
3. **Comparison Operations**
|
||||
- `__eq__(self, other)`: Defines behavior for the `==` operator.
|
||||
- `__lt__(self, other)`: Defines behavior for the `<` operator.
|
||||
- `__le__(self, other)`: Defines behavior for the `<=` operator.
|
||||
|
||||
**Example** :
|
||||
|
||||
```python
|
||||
class Person:
|
||||
def __init__(self, name, age):
|
||||
self.name = name
|
||||
self.age = age
|
||||
|
||||
def __eq__(self, other):
|
||||
return self.age == other.age
|
||||
|
||||
def __lt__(self, other):
|
||||
return self.age < other.age
|
||||
|
||||
p1 = Person("Alice", 30)
|
||||
p2 = Person("Bob", 25)
|
||||
print(p1 == p2)
|
||||
print(p1 < p2)
|
||||
```
|
||||
|
||||
**Output** :
|
||||
|
||||
```python
|
||||
False
|
||||
False
|
||||
```
|
||||
|
||||
5. **Container and Sequence Methods**
|
||||
|
||||
- `__len__(self)`: Defines behavior for the `len()` function.
|
||||
- `__getitem__(self, key)`: Defines behavior for indexing (`self[key]`).
|
||||
- `__setitem__(self, key, value)`: Defines behavior for item assignment (`self[key] = value`).
|
||||
- `__delitem__(self, key)`: Defines behavior for item deletion (`del self[key]`).
|
||||
|
||||
**Example** :
|
||||
|
||||
```python
|
||||
class CustomList:
|
||||
def __init__(self, *args):
|
||||
self.items = list(args)
|
||||
|
||||
def __len__(self):
|
||||
return len(self.items)
|
||||
|
||||
def __getitem__(self, index):
|
||||
return self.items[index]
|
||||
|
||||
def __setitem__(self, index, value):
|
||||
self.items[index] = value
|
||||
|
||||
def __delitem__(self, index):
|
||||
del self.items[index]
|
||||
|
||||
def __repr__(self):
|
||||
return f"CustomList({self.items})"
|
||||
|
||||
cl = CustomList(1, 2, 3)
|
||||
print(len(cl))
|
||||
print(cl[1])
|
||||
cl[1] = 5
|
||||
print(cl)
|
||||
del cl[1]
|
||||
print(cl)
|
||||
```
|
||||
|
||||
**Output** :
|
||||
```python
|
||||
3
|
||||
2
|
||||
CustomList([1, 5, 3])
|
||||
CustomList([1, 3])
|
||||
```
|
||||
|
||||
Magic methods provide powerful ways to customize the behavior of your objects and make them work seamlessly with Python's syntax and built-in functions.
|
||||
Use them judiciously to enhance the functionality and readability of your classes.
|
|
@ -0,0 +1,251 @@
|
|||
# Match Case Statements
|
||||
## Introduction
|
||||
Match and case statements are introduced in Python 3.10 for structural pattern matching of patterns with associated actions. It offers more readible and
|
||||
cleaniness to the code as opposed to the traditional `if-else` statements. They also have destructuring, pattern matching and checks for specific properties in
|
||||
addition to the traditional `switch-case` statements in other languages, which makes them more versatile.
|
||||
|
||||
## Syntax
|
||||
```
|
||||
match <statement>:
|
||||
case <pattern_1>:
|
||||
<do_task_1>
|
||||
case <pattern_2>:
|
||||
<do_task_2>
|
||||
case _:
|
||||
<do_task_wildcard>
|
||||
```
|
||||
A match statement takes a statement which compares it to the various cases and their patterns. If any of the pattern is matched successively, the task is performed accordingly. If an exact match is not confirmed, the last case, a wildcard `_`, if provided, will be used as the matching case.
|
||||
|
||||
## Pattern Matching
|
||||
As discussed earlier, match case statements use pattern matching where the patterns consist of sequences, mappings, primitive data types as well as class instances. The structural pattern matching uses declarative approach and it nexplicitly states the conditions for the patterns to match with the data.
|
||||
|
||||
### Patterns with a Literal
|
||||
#### Generic Case
|
||||
`sample text` is passed as a literal in the `match` block. There are two cases and a wildcard case mentioned.
|
||||
```python
|
||||
match 'sample text':
|
||||
case 'sample text':
|
||||
print('sample text')
|
||||
case 'sample':
|
||||
print('sample')
|
||||
case _:
|
||||
print('None found')
|
||||
```
|
||||
The `sample text` case is satisfied as it matches with the literal `sample text` described in the `match` block.
|
||||
|
||||
O/P:
|
||||
```
|
||||
sample text
|
||||
```
|
||||
|
||||
#### Using OR
|
||||
Taking another example, `|` can be used as OR to include multiple patterns in a single case statement where the multiple patterns all lead to a similar task.
|
||||
|
||||
The below code snippets can be used interchangebly and generate the similar output. The latter is more consive and readible.
|
||||
```python
|
||||
match 'e':
|
||||
case 'a':
|
||||
print('vowel')
|
||||
case 'e':
|
||||
print('vowel')
|
||||
case 'i':
|
||||
print('vowel')
|
||||
case 'o':
|
||||
print('vowel')
|
||||
case 'u':
|
||||
print('vowel')
|
||||
case _:
|
||||
print('consonant')
|
||||
```
|
||||
```python
|
||||
match 'e':
|
||||
case 'a' | 'e' | 'i' | 'o' | 'u':
|
||||
print('vowel')
|
||||
case _:
|
||||
print('consonant')
|
||||
```
|
||||
O/P:
|
||||
```
|
||||
vowel
|
||||
```
|
||||
|
||||
#### Without wildcard
|
||||
When in a `match` block, there is no wildcard case present there are be two cases of match being present or not. If the match doesn't exist, the behaviour is a no-op.
|
||||
```python
|
||||
match 'c':
|
||||
case 'a' | 'e' | 'i' | 'o' | 'u':
|
||||
print('vowel')
|
||||
```
|
||||
The output will be blank as a no-op occurs.
|
||||
|
||||
### Patterns with a Literal and a Variable
|
||||
Pattern matching can be done by unpacking the assignments and also bind variables with it.
|
||||
```python
|
||||
def get_names(names: str) -> None:
|
||||
match names:
|
||||
case ('Bob', y):
|
||||
print(f'Hello {y}')
|
||||
case (x, 'John'):
|
||||
print(f'Hello {x}')
|
||||
case (x, y):
|
||||
print(f'Hello {x} and {y}')
|
||||
case _:
|
||||
print('Invalid')
|
||||
```
|
||||
Here, the `names` is a tuple that contains two names. The `match` block unpacks the tuple and binds `x` and `y` based on the patterns. A wildcard case prints `Invalid` if the condition is not satisfied.
|
||||
|
||||
O/P:
|
||||
|
||||
In this example, the above code snippet with the parameter `names` as below and the respective output.
|
||||
```
|
||||
>>> get_names(('Bob', 'Max'))
|
||||
Hello Max
|
||||
|
||||
>>> get_names(('Rob', 'John'))
|
||||
Hello Rob
|
||||
|
||||
>>> get_names(('Rob', 'Max'))
|
||||
Hello Rob and Max
|
||||
|
||||
>>> get_names(('Rob', 'Max', 'Bob'))
|
||||
Invalid
|
||||
```
|
||||
|
||||
### Patterns with Classes
|
||||
Class structures can be used in `match` block for pattern matching. The class members can also be binded with a variable to perform certain operations. For the class structure:
|
||||
```python
|
||||
class Person:
|
||||
def __init__(self, name, age):
|
||||
self.name = name
|
||||
self.age = age
|
||||
```
|
||||
The match case example illustrates the generic working as well as the binding of variables with the class members.
|
||||
```python
|
||||
def get_class(cls: Person) -> None:
|
||||
match cls:
|
||||
case Person(name='Bob', age=18):
|
||||
print('Hello Bob with age 18')
|
||||
case Person(name='Max', age=y):
|
||||
print(f'Age is {y}')
|
||||
case Person(name=x, age=18):
|
||||
print(f'Name is {x}')
|
||||
case Person(name=x, age=y):
|
||||
print(f'Name and age is {x} and {y}')
|
||||
case _:
|
||||
print('Invalid')
|
||||
```
|
||||
O/P:
|
||||
```
|
||||
>>> get_class(Person('Bob', 18))
|
||||
Hello Bob with age 18
|
||||
|
||||
>>> get_class(Person('Max', 21))
|
||||
Age is 21
|
||||
|
||||
>>> get_class(Person('Rob', 18))
|
||||
Name is Rob
|
||||
|
||||
>>> get_class(Person('Rob', 21))
|
||||
Name and age is Rob and 21
|
||||
```
|
||||
Now, if a new class is introduced in the above code snippet like below.
|
||||
```python
|
||||
class Pet:
|
||||
def __init__(self, name, animal):
|
||||
self.name = name
|
||||
self.animal = animal
|
||||
```
|
||||
The patterns will not match the cases and will trigger the wildcard case for the original code snippet above with `get_class` function.
|
||||
```
|
||||
>>> get_class(Pet('Tommy', 'Dog'))
|
||||
Invalid
|
||||
```
|
||||
|
||||
### Nested Patterns
|
||||
The patterns can be nested via various means. It can include the mix of the patterns mentioned earlier or can be symmetrical across. A basic of the nested pattern of a list with Patterns with a Literal and Variable is taken. Classes and Iterables can laso be included.
|
||||
```python
|
||||
def get_points(points: list) -> None:
|
||||
match points:
|
||||
case []:
|
||||
print('Empty')
|
||||
case [x]:
|
||||
print(f'One point {x}')
|
||||
case [x, y]:
|
||||
print(f'Two points {x} and {y}')
|
||||
case _:
|
||||
print('More than two points')
|
||||
```
|
||||
O/P:
|
||||
```
|
||||
>>> get_points([])
|
||||
Empty
|
||||
|
||||
>>> get_points([1])
|
||||
One point 1
|
||||
|
||||
>>> get_points([1, 2])
|
||||
Two points 1 and 2
|
||||
|
||||
>>> get_points([1, 2, 3])
|
||||
More than two points
|
||||
```
|
||||
|
||||
### Complex Patterns
|
||||
Complex patterns are also supported in the pattern matching sequence. The complex does not mean complex numbers but rather the structure which makes the readibility to seem complex.
|
||||
|
||||
#### Wildcard
|
||||
The wildcard used till now are in the form of `case _` where the wildcard case is used if no match is found. Furthermore, the wildcard `_` can also be used as a placeholder in complex patterns.
|
||||
|
||||
```python
|
||||
def wildcard(value: tuple) -> None:
|
||||
match value:
|
||||
case ('Bob', age, 'Mechanic'):
|
||||
print(f'Bob is mechanic of age {age}')
|
||||
case ('Bob', age, _):
|
||||
print(f'Bob is not a mechanic of age {age}')
|
||||
```
|
||||
O/P:
|
||||
|
||||
The value in the above snippet is a tuple with `(Name, Age, Job)`. If the job is Mechanic and the name is Bob, the first case is triggered. But if the job is different and not a mechanic, then the other case is triggered with the wildcard.
|
||||
```
|
||||
>>> wildcard(('Bob', 18, 'Mechanic'))
|
||||
Bob is mechanic of age 18
|
||||
|
||||
>>> wildcard(('Bob', 21, 'Engineer'))
|
||||
Bob is not a mechanic of age 21
|
||||
```
|
||||
|
||||
#### Guard
|
||||
A `guard` is when an `if` is added to a pattern. The evaluation depends on the truth value of the guard.
|
||||
|
||||
`nums` is the tuple which contains two integers. A guard is the first case where it checks whether the first number is greater or equal to the second number in the tuple. If it is false, then it moves to the second case, where it concludes that the first number is smaller than the second number.
|
||||
```python
|
||||
def guard(nums: tuple) -> None:
|
||||
match nums:
|
||||
case (x, y) if x >= y:
|
||||
print(f'{x} is greater or equal than {y}')
|
||||
case (x, y):
|
||||
print(f'{x} is smaller than {y}')
|
||||
case _:
|
||||
print('Invalid')
|
||||
```
|
||||
O/P:
|
||||
```
|
||||
>>> guard((1, 2))
|
||||
1 is smaller than 2
|
||||
|
||||
>>> guard((2, 1))
|
||||
2 is greater or equal than 1
|
||||
|
||||
>>> guard((1, 1))
|
||||
1 is greater or equal than 1
|
||||
```
|
||||
|
||||
## Summary
|
||||
The match case statements provide an elegant and readible format to perform operations on pattern matching as compared to `if-else` statements. They are also more versatile as they provide additional functionalities on the pattern matching operations like unpacking, class matching, iterables and iterators. It can also use positional arguments for checking the patterns. They provide a powerful and concise way to handle multiple conditions and perform pattern matching
|
||||
|
||||
## Further Reading
|
||||
This article provides a brief introduction to the match case statements and the overview on the pattern matching operations. To know more, the below articles can be used for in-depth understanding of the topic.
|
||||
|
||||
- [PEP 634 – Structural Pattern Matching: Specification](https://peps.python.org/pep-0634/)
|
||||
- [PEP 636 – Structural Pattern Matching: Tutorial](https://peps.python.org/pep-0636/)
|
|
@ -0,0 +1,72 @@
|
|||
# Reduce Function
|
||||
|
||||
## Definition:
|
||||
The reduce() function is part of the functools module and is used to apply a binary function (a function that takes two arguments) cumulatively to the items of an iterable (e.g., a list, tuple, or string). It reduces the iterable to a single value by successively combining elements.
|
||||
|
||||
**Syntax**:
|
||||
```python
|
||||
from functools import reduce
|
||||
reduce(function, iterable, initial=None)
|
||||
```
|
||||
**Parameters**:<br>
|
||||
*function* : The binary function to apply. It takes two arguments and returns a single value.<br>
|
||||
*iterable* : The sequence of elements to process.<br>
|
||||
*initial (optional)*: An initial value. If provided, the function is applied to the initial value and the first element of the iterable. Otherwise, the first two elements are used as the initial values.
|
||||
|
||||
## Working:
|
||||
- Intially , first two elements of iterable are picked and the result is obtained.
|
||||
- Next step is to apply the same function to the previously attained result and the number just succeeding the second element and the result is again stored.
|
||||
- This process continues till no more elements are left in the container.
|
||||
- The final returned result is returned and printed on console.
|
||||
|
||||
## Examples:
|
||||
|
||||
**Example 1:**
|
||||
```python
|
||||
numbers = [1, 2, 3, 4, 10]
|
||||
total = reduce(lambda x, y: x + y, numbers)
|
||||
print(total) # Output: 20
|
||||
```
|
||||
**Example 2:**
|
||||
```python
|
||||
numbers = [11, 7, 8, 20, 1]
|
||||
max_value = reduce(lambda x, y: x if x > y else y, numbers)
|
||||
print(max_value) # Output: 20
|
||||
```
|
||||
**Example 3:**
|
||||
```python
|
||||
# Importing reduce function from functools
|
||||
from functools import reduce
|
||||
|
||||
# Creating a list
|
||||
my_list = [10, 20, 30, 40, 50]
|
||||
|
||||
# Calculating the product of the numbers in my_list
|
||||
# using reduce and lambda functions together
|
||||
product = reduce(lambda x, y: x * y, my_list)
|
||||
|
||||
# Printing output
|
||||
print(f"Product = {product}") # Output : Product = 12000000
|
||||
```
|
||||
|
||||
## Difference Between reduce() and accumulate():
|
||||
- **Behavior:**
|
||||
- reduce() stores intermediate results and only returns the final summation value.
|
||||
- accumulate() returns an iterator containing all intermediate results. The last value in the iterator is the summation value of the list.
|
||||
|
||||
- **Use Cases:**
|
||||
- Use reduce() when you need a single result (e.g., total sum, product) from the iterable.
|
||||
- Use accumulate() when you want to access intermediate results during the reduction process.
|
||||
|
||||
- **Initial Value:**
|
||||
- reduce() allows an optional initial value.
|
||||
- accumulate() also accepts an optional initial value since Python 3.8.
|
||||
|
||||
- **Order of Arguments:**
|
||||
- reduce() takes the function first, followed by the iterable.
|
||||
- accumulate() takes the iterable first, followed by the function.
|
||||
|
||||
## Conclusion:
|
||||
Python's Reduce function enables us to apply reduction operations to iterables using lambda and callable functions. A
|
||||
function called reduce() reduces the elements of an iterable to a single cumulative value. The reduce function in
|
||||
Python solves various straightforward issues, including adding and multiplying iterables of numbers.
|
|
@ -0,0 +1,106 @@
|
|||
# Introduction to Type Hinting in Python
|
||||
Type hinting is a feature in Python that allows you to specify the expected data types of variables, function arguments, and return values. It was introduced
|
||||
in Python 3.5 via PEP 484 and has since become a standard practice to improve code readability and facilitate static analysis tools.
|
||||
|
||||
**Benefits of Type Hinting**
|
||||
|
||||
1. Improved Readability: Type hints make it clear what type of data is expected, making the code easier to understand for others and your future self.
|
||||
2. Error Detection: Static analysis tools like MyPy can use type hints to detect type errors before runtime, reducing bugs and improving code quality.
|
||||
3.Better Tooling Support: Modern IDEs and editors can leverage type hints to provide better autocompletion, refactoring, and error checking features.
|
||||
4. Documentation: Type hints serve as a form of documentation, indicating the intended usage of functions and classes.
|
||||
|
||||
**Syntax of Type Hinting** <br>
|
||||
Type hints can be added to variables, function arguments, and return values using annotations.
|
||||
|
||||
1. Variable Annotations:
|
||||
|
||||
```bash
|
||||
age: int = 25
|
||||
name: str = "Alice"
|
||||
is_student: bool = True
|
||||
```
|
||||
|
||||
2. Function Annotations:
|
||||
|
||||
```bash
|
||||
def greet(name: str) -> str:
|
||||
return f"Hello, {name}!"
|
||||
```
|
||||
|
||||
3. Multiple Arguments and Return Types:
|
||||
|
||||
```bash
|
||||
def add(a: int, b: int) -> int:
|
||||
return a + b
|
||||
```
|
||||
|
||||
4. Optional Types: Use the Optional type from the typing module for values that could be None.
|
||||
|
||||
```bash
|
||||
from typing import Optional
|
||||
|
||||
def get_user_name(user_id: int) -> Optional[str]:
|
||||
# Function logic here
|
||||
return None # Example return value
|
||||
```
|
||||
|
||||
5. Union Types: Use the Union type when a variable can be of multiple types.
|
||||
|
||||
```bash
|
||||
from typing import Union
|
||||
|
||||
def get_value(key: str) -> Union[int, str]:
|
||||
# Function logic here
|
||||
return "value" # Example return value
|
||||
```
|
||||
|
||||
6. List and Dictionary Types: Use the List and Dict types from the typing module for collections.
|
||||
|
||||
```bash
|
||||
from typing import List, Dict
|
||||
|
||||
def process_data(data: List[int]) -> Dict[str, int]:
|
||||
# Function logic here
|
||||
return {"sum": sum(data)} # Example return value
|
||||
```
|
||||
|
||||
7. Type Aliases: Create type aliases for complex types to make the code more readable.
|
||||
|
||||
```bash
|
||||
from typing import List, Tuple
|
||||
|
||||
Coordinates = List[Tuple[int, int]]
|
||||
|
||||
def draw_shape(points: Coordinates) -> None:
|
||||
# Function logic here
|
||||
pass
|
||||
```
|
||||
|
||||
**Example of Type Hinting in a Class** <br>
|
||||
Here is a more comprehensive example using type hints in a class:
|
||||
|
||||
```bash
|
||||
from typing import List
|
||||
|
||||
class Student:
|
||||
def __init__(self, name: str, age: int, grades: List[int]) -> None:
|
||||
self.name = name
|
||||
self.age = age
|
||||
self.grades = grades
|
||||
|
||||
def average_grade(self) -> float:
|
||||
return sum(self.grades) / len(self.grades)
|
||||
|
||||
def add_grade(self, grade: int) -> None:
|
||||
self.grades.append(grade)
|
||||
|
||||
# Example usage
|
||||
student = Student("Alice", 20, [90, 85, 88])
|
||||
print(student.average_grade()) # Output: 87.66666666666667
|
||||
student.add_grade(92)
|
||||
print(student.average_grade()) # Output: 88.75
|
||||
```
|
||||
|
||||
### Conclusion
|
||||
Type hinting in Python enhances code readability, facilitates error detection through static analysis, and improves tooling support. By adopting
|
||||
type hinting, you can write clearer and more maintainable code, reducing the likelihood of bugs and making your codebase easier to navigate for yourself and others.
|
|
@ -0,0 +1,185 @@
|
|||
# AVL Tree
|
||||
|
||||
In Data Structures and Algorithms, an **AVL Tree** is a self-balancing binary search tree (BST) where the difference between heights of left and right subtrees cannot be more than one for all nodes. It ensures that the tree remains balanced, providing efficient search, insertion, and deletion operations.
|
||||
|
||||
## Points to be Remembered
|
||||
|
||||
- **Balance Factor**: The difference in heights between the left and right subtrees of a node. It should be -1, 0, or +1 for all nodes in an AVL tree.
|
||||
- **Rotations**: Tree rotations (left, right, left-right, right-left) are used to maintain the balance factor within the allowed range.
|
||||
|
||||
## Real Life Examples of AVL Trees
|
||||
|
||||
- **Databases**: AVL trees can be used to maintain large indexes for database tables, ensuring quick data retrieval.
|
||||
- **File Systems**: Some file systems use AVL trees to keep track of free and used memory blocks.
|
||||
|
||||
## Applications of AVL Trees
|
||||
|
||||
AVL trees are used in various applications in Computer Science:
|
||||
|
||||
- **Database Indexing**
|
||||
- **Memory Allocation**
|
||||
- **Network Routing Algorithms**
|
||||
|
||||
Understanding these applications is essential for Software Development.
|
||||
|
||||
## Operations in AVL Tree
|
||||
|
||||
Key operations include:
|
||||
|
||||
- **INSERT**: Insert a new element into the AVL tree.
|
||||
- **SEARCH**: Find the position of an element in the AVL tree.
|
||||
- **DELETE**: Remove an element from the AVL tree.
|
||||
|
||||
## Implementing AVL Tree in Python
|
||||
|
||||
```python
|
||||
class AVLTreeNode:
|
||||
def __init__(self, key):
|
||||
self.key = key
|
||||
self.left = None
|
||||
self.right = None
|
||||
self.height = 1
|
||||
|
||||
class AVLTree:
|
||||
def insert(self, root, key):
|
||||
if not root:
|
||||
return AVLTreeNode(key)
|
||||
|
||||
if key < root.key:
|
||||
root.left = self.insert(root.left, key)
|
||||
else:
|
||||
root.right = self.insert(root.right, key)
|
||||
|
||||
root.height = 1 + max(self.getHeight(root.left), self.getHeight(root.right))
|
||||
balance = self.getBalance(root)
|
||||
|
||||
if balance > 1 and key < root.left.key:
|
||||
return self.rotateRight(root)
|
||||
if balance < -1 and key > root.right.key:
|
||||
return self.rotateLeft(root)
|
||||
if balance > 1 and key > root.left.key:
|
||||
root.left = self.rotateLeft(root.left)
|
||||
return self.rotateRight(root)
|
||||
if balance < -1 and key < root.right.key:
|
||||
root.right = self.rotateRight(root.right)
|
||||
return self.rotateLeft(root)
|
||||
|
||||
return root
|
||||
|
||||
def search(self, root, key):
|
||||
if not root or root.key == key:
|
||||
return root
|
||||
|
||||
if key < root.key:
|
||||
return self.search(root.left, key)
|
||||
|
||||
return self.search(root.right, key)
|
||||
|
||||
def delete(self, root, key):
|
||||
if not root:
|
||||
return root
|
||||
|
||||
if key < root.key:
|
||||
root.left = self.delete(root.left, key)
|
||||
elif key > root.key:
|
||||
root.right = self.delete(root.right, key)
|
||||
else:
|
||||
if root.left is None:
|
||||
temp = root.right
|
||||
root = None
|
||||
return temp
|
||||
elif root.right is None:
|
||||
temp = root.left
|
||||
root = None
|
||||
return temp
|
||||
|
||||
temp = self.getMinValueNode(root.right)
|
||||
root.key = temp.key
|
||||
root.right = self.delete(root.right, temp.key)
|
||||
|
||||
if root is None:
|
||||
return root
|
||||
|
||||
root.height = 1 + max(self.getHeight(root.left), self.getHeight(root.right))
|
||||
balance = self.getBalance(root)
|
||||
|
||||
if balance > 1 and self.getBalance(root.left) >= 0:
|
||||
return self.rotateRight(root)
|
||||
if balance < -1 and self.getBalance(root.right) <= 0:
|
||||
return self.rotateLeft(root)
|
||||
if balance > 1 and self.getBalance(root.left) < 0:
|
||||
root.left = self.rotateLeft(root.left)
|
||||
return self.rotateRight(root)
|
||||
if balance < -1 and self.getBalance(root.right) > 0:
|
||||
root.right = self.rotateRight(root.right)
|
||||
return self.rotateLeft(root)
|
||||
|
||||
return root
|
||||
|
||||
def rotateLeft(self, z):
|
||||
y = z.right
|
||||
T2 = y.left
|
||||
y.left = z
|
||||
z.right = T2
|
||||
z.height = 1 + max(self.getHeight(z.left), self.getHeight(z.right))
|
||||
y.height = 1 + max(self.getHeight(y.left), self.getHeight(y.right))
|
||||
return y
|
||||
|
||||
def rotateRight(self, z):
|
||||
y = z.left
|
||||
T3 = y.right
|
||||
y.right = z
|
||||
z.left = T3
|
||||
z.height = 1 + max(self.getHeight(z.left), self.getHeight(z.right))
|
||||
y.height = 1 + max(self.getHeight(y.left), self.getHeight(y.right))
|
||||
return y
|
||||
|
||||
def getHeight(self, root):
|
||||
if not root:
|
||||
return 0
|
||||
return root.height
|
||||
|
||||
def getBalance(self, root):
|
||||
if not root:
|
||||
return 0
|
||||
return self.getHeight(root.left) - self.getHeight(root.right)
|
||||
|
||||
def getMinValueNode(self, root):
|
||||
if root is None or root.left is None:
|
||||
return root
|
||||
return self.getMinValueNode(root.left)
|
||||
|
||||
def preOrder(self, root):
|
||||
if not root:
|
||||
return
|
||||
print(root.key, end=' ')
|
||||
self.preOrder(root.left)
|
||||
self.preOrder(root.right)
|
||||
|
||||
#Example usage
|
||||
avl_tree = AVLTree()
|
||||
root = None
|
||||
|
||||
root = avl_tree.insert(root, 10)
|
||||
root = avl_tree.insert(root, 20)
|
||||
root = avl_tree.insert(root, 30)
|
||||
root = avl_tree.insert(root, 40)
|
||||
root = avl_tree.insert(root, 50)
|
||||
root = avl_tree.insert(root, 25)
|
||||
|
||||
print("Preorder traversal of the AVL tree is:")
|
||||
avl_tree.preOrder(root)
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
```markdown
|
||||
Preorder traversal of the AVL tree is:
|
||||
30 20 10 25 40 50
|
||||
```
|
||||
|
||||
## Complexity Analysis
|
||||
|
||||
- **Insertion**: O(logn). Inserting a node involves traversing the height of the tree, which is logarithmic due to the balancing property.
|
||||
- **Search**: O(logn). Searching for a node involves traversing the height of the tree.
|
||||
- **Deletion**: O(logn). Deleting a node involves traversing and potentially rebalancing the tree, maintaining the logarithmic height.
|
|
@ -0,0 +1,231 @@
|
|||
# Binary Tree
|
||||
|
||||
A binary tree is a non-linear data structure in which each node can have atmost two children, known as the left and the right child. It is a heirarchial data structure represented in the following way:
|
||||
|
||||
```
|
||||
A...................Level 0
|
||||
/ \
|
||||
B C.................Level 1
|
||||
/ \ \
|
||||
D E G...............Level 2
|
||||
```
|
||||
|
||||
## Basic Terminologies
|
||||
|
||||
- **Root node:** The topmost node in a tree is the root node. The root node does not have any parent. In the above example, **A** is the root node.
|
||||
- **Parent node:** The predecessor of a node is called the parent of that node. **A** is the parent of **B** and **C**, **B** is the parent of **D** and **E** and **C** is the parent of **G**.
|
||||
- **Child node:** The successor of a node is called the child of that node. **B** and **C** are children of **A**, **D** and **E** are children of **B** and **G** is the right child of **C**.
|
||||
- **Leaf node:** Nodes without any children are called the leaf nodes. **D**, **E** and **G** are the leaf nodes.
|
||||
- **Ancestor node:** Predecessor nodes on the path from the root to that node are called ancestor nodes. **A** and **B** are the ancestors of **E**.
|
||||
- **Descendant node:** Successor nodes on the path from the root to that node are called descendant nodes. **B** and **E** are descendants of **A**.
|
||||
- **Sibling node:** Nodes having the same parent are called sibling nodes. **B** and **C** are sibling nodes and so are **D** and **E**.
|
||||
- **Level (Depth) of a node:** Number of edges in the path from the root to that node is the level of that node. The root node is always at level 0. The depth of root node is the depth of the tree.
|
||||
- **Height of a node:** Number of edges in the path from that node to the deepest leaf is the height of that node. The height of the root is the height of a tree. Height of node **A** is 2, nodes **B** and **C** is 1 and nodes **D**, **E** and **G** is 0.
|
||||
|
||||
## Types Of Binary Trees
|
||||
|
||||
- **Full Binary Tree:** A binary tree where each node has 0 or 2 children is a full binary tree.
|
||||
```
|
||||
A
|
||||
/ \
|
||||
B C
|
||||
/ \
|
||||
D E
|
||||
```
|
||||
- **Complete Binary Tree:** A binary tree in which all levels are completely filled except the last level is a complete binary tree. Whenever new nodes are inserted, they are inserted from the left side.
|
||||
```
|
||||
A
|
||||
/ \
|
||||
/ \
|
||||
B C
|
||||
/ \ /
|
||||
D E F
|
||||
```
|
||||
- **Perfect Binary Tree:** A binary tree in which all nodes are completely filled, i.e., each node has two children is called a perfect binary tree.
|
||||
```
|
||||
A
|
||||
/ \
|
||||
/ \
|
||||
B C
|
||||
/ \ / \
|
||||
D E F G
|
||||
```
|
||||
- **Skewed Binary Tree:** A binary tree in which each node has either 0 or 1 child is called a skewed binary tree. It is of two types - left skewed binary tree and right skewed binary tree.
|
||||
```
|
||||
A A
|
||||
\ /
|
||||
B B
|
||||
\ /
|
||||
C C
|
||||
Right skewed binary tree Left skewed binary tree
|
||||
```
|
||||
- **Balanced Binary Tree:** A binary tree in which the height difference between the left and right subtree is not more than one and the subtrees are also balanced is a balanced binary tree.
|
||||
```
|
||||
A
|
||||
/ \
|
||||
B C
|
||||
/ \
|
||||
D E
|
||||
```
|
||||
|
||||
## Real Life Applications Of Binary Tree
|
||||
|
||||
- **File Systems:** File systems employ binary trees to organize the folders and files, facilitating efficient search and access of files.
|
||||
- **Decision Trees:** Decision tree, a supervised learning algorithm, utilizes binary trees, with each node representing a decision and its edges showing the possible outcomes.
|
||||
- **Routing Algorithms:** In routing algorithms, binary trees are used to efficiently transfer data packets from the source to destination through a network of nodes.
|
||||
- **Searching and sorting Algorithms:** Searching algorithms like binary search and sorting algorithms like heapsort heavily rely on binary trees.
|
||||
|
||||
## Implementation of Binary Tree
|
||||
|
||||
```python
|
||||
from collections import deque
|
||||
|
||||
class Node:
|
||||
def __init__(self, data):
|
||||
self.data = data
|
||||
self.left = None
|
||||
self.right = None
|
||||
|
||||
class Binary_tree:
|
||||
@staticmethod
|
||||
def insert(root, data):
|
||||
if root is None:
|
||||
return Node(data)
|
||||
q = deque()
|
||||
q.append(root)
|
||||
while q:
|
||||
temp = q.popleft()
|
||||
if temp.left is None:
|
||||
temp.left = Node(data)
|
||||
break
|
||||
else:
|
||||
q.append(temp.left)
|
||||
if temp.right is None:
|
||||
temp.right = Node(data)
|
||||
break
|
||||
else:
|
||||
q.append(temp.right)
|
||||
return root
|
||||
|
||||
@staticmethod
|
||||
def inorder(root):
|
||||
if not root:
|
||||
return
|
||||
b.inorder(root.left)
|
||||
print(root.data, end=" ")
|
||||
b.inorder(root.right)
|
||||
|
||||
@staticmethod
|
||||
def preorder(root):
|
||||
if not root:
|
||||
return
|
||||
print(root.data, end=" ")
|
||||
b.preorder(root.left)
|
||||
b.preorder(root.right)
|
||||
|
||||
@staticmethod
|
||||
def postorder(root):
|
||||
if not root:
|
||||
return
|
||||
b.postorder(root.left)
|
||||
b.postorder(root.right)
|
||||
print(root.data, end=" ")
|
||||
|
||||
@staticmethod
|
||||
def levelorder(root):
|
||||
if not root:
|
||||
return
|
||||
q = deque()
|
||||
q.append(root)
|
||||
while q:
|
||||
temp = q.popleft()
|
||||
print(temp.data, end=" ")
|
||||
if temp.left is not None:
|
||||
q.append(temp.left)
|
||||
if temp.right is not None:
|
||||
q.append(temp.right)
|
||||
|
||||
@staticmethod
|
||||
def delete(root, value):
|
||||
q = deque()
|
||||
q.append(root)
|
||||
while q:
|
||||
temp = q.popleft()
|
||||
if temp is value:
|
||||
temp = None
|
||||
return
|
||||
if temp.right:
|
||||
if temp.right is value:
|
||||
temp.right = None
|
||||
return
|
||||
else:
|
||||
q.append(temp.right)
|
||||
if temp.left:
|
||||
if temp.left is value:
|
||||
temp.left = None
|
||||
return
|
||||
else:
|
||||
q.append(temp.left)
|
||||
|
||||
@staticmethod
|
||||
def delete_value(root, value):
|
||||
if root is None:
|
||||
return None
|
||||
if root.left is None and root.right is None:
|
||||
if root.data == value:
|
||||
return None
|
||||
else:
|
||||
return root
|
||||
x = None
|
||||
q = deque()
|
||||
q.append(root)
|
||||
temp = None
|
||||
while q:
|
||||
temp = q.popleft()
|
||||
if temp.data == value:
|
||||
x = temp
|
||||
if temp.left:
|
||||
q.append(temp.left)
|
||||
if temp.right:
|
||||
q.append(temp.right)
|
||||
if x:
|
||||
y = temp.data
|
||||
x.data = y
|
||||
b.delete(root, temp)
|
||||
return root
|
||||
|
||||
b = Binary_tree()
|
||||
root = None
|
||||
root = b.insert(root, 10)
|
||||
root = b.insert(root, 20)
|
||||
root = b.insert(root, 30)
|
||||
root = b.insert(root, 40)
|
||||
root = b.insert(root, 50)
|
||||
root = b.insert(root, 60)
|
||||
|
||||
print("Preorder traversal:", end=" ")
|
||||
b.preorder(root)
|
||||
|
||||
print("\nInorder traversal:", end=" ")
|
||||
b.inorder(root)
|
||||
|
||||
print("\nPostorder traversal:", end=" ")
|
||||
b.postorder(root)
|
||||
|
||||
print("\nLevel order traversal:", end=" ")
|
||||
b.levelorder(root)
|
||||
|
||||
root = b.delete_value(root, 20)
|
||||
print("\nLevel order traversal after deletion:", end=" ")
|
||||
b.levelorder(root)
|
||||
```
|
||||
|
||||
#### OUTPUT
|
||||
|
||||
```
|
||||
Preorder traversal: 10 20 40 50 30 60
|
||||
Inorder traversal: 40 20 50 10 60 30
|
||||
Postorder traversal: 40 50 20 60 30 10
|
||||
Level order traversal: 10 20 30 40 50 60
|
||||
Level order traversal after deletion: 10 60 30 40 50
|
||||
```
|
|
@ -0,0 +1,90 @@
|
|||
|
||||
# Dijkstra's Algorithm
|
||||
Dijkstra's algorithm is a graph algorithm that gives the shortest distance of each node from the given node in a weighted, undirected graph. It operates by continually choosing the closest unvisited node and determining the distance to all its unvisited neighboring nodes. This algorithm is similar to BFS in graphs, with the difference being it gives priority to nodes with shorter distances by using a priority queue(min-heap) instead of a FIFO queue. The data structures required would be a distance list (to store the minimum distance of each node), a priority queue or a set, and we assume the adjacency list will be provided.
|
||||
|
||||
## Working
|
||||
- We will store the minimum distance of each node in the distance list, which has a length equal to the number of nodes in the graph. Thus, the minimum distance of the 2nd node will be stored in the 2nd index of the distance list. We initialize the list with the maximum number possible, say infinity.
|
||||
|
||||
- We now start the traversal from the starting node given and mark its distance as 0. We push this node to the priority queue along with its minimum distance, which is 0, so the structure pushed will be (0, node), a tuple.
|
||||
|
||||
- Now, with the help of the adjacency list, we will add the neighboring nodes to the priority queue with the distance equal to (edge weight + current node distance), and this should be less than the distance list value. We will also update the distance list in the process.
|
||||
|
||||
- When all the nodes are added, we will select the node with the shortest distance and repeat the process.
|
||||
|
||||
## Dry Run
|
||||
We will now do a manual simulation using an example graph given. First, (0, a) is pushed to the priority queue (pq).
|
||||

|
||||
|
||||
- **Step1:** The lowest element is popped from the pq, which is (0, a), and all its neighboring nodes are added to the pq while simultaneously checking the distance list. Thus (3, b), (7, c), (1, d) are added to the pq.
|
||||

|
||||
|
||||
- **Step2:** Again, the lowest element is popped from the pq, which is (1, d). It has two neighboring nodes, a and e, from which
|
||||
(0 + 1, a) will not be added to the pq as dist[a] = 0 is less than 1.
|
||||

|
||||
|
||||
- **Step3:** Now, the lowest element is popped from the pq, which is (3, b). It has two neighboring nodes, a and c, from which
|
||||
(0 + 1, a) will not be added to the pq. But the new distance to reach c is 5 (3 + 2), which is less than dist[c] = 7. So (5, c) is added to the pq.
|
||||

|
||||
|
||||
- **Step4:** The next smallest element is (5, c). It has neighbors a and e. The new distance to reach a will be 5 + 7 = 12, which is more than dist[a], so it will not be considered. Similarly, the new distance for e is 5 + 3 = 8, which again will not be considered. So, no new tuple has been added to the pq.
|
||||

|
||||
|
||||
- **Step5:** Similarly, both the elements of the pq will be popped one by one without any new addition.
|
||||

|
||||

|
||||
|
||||
- The distance list we get at the end will be our answer.
|
||||
- `Output` `dist=[1, 3, 7, 1, 6]`
|
||||
|
||||
## Python Code
|
||||
```python
|
||||
import heapq
|
||||
|
||||
def dijkstra(graph, start):
|
||||
# Create a priority queue
|
||||
pq = []
|
||||
heapq.heappush(pq, (0, start))
|
||||
|
||||
# Create a dictionary to store distances to each node
|
||||
dist = {node: float('inf') for node in graph}
|
||||
dist[start] = 0
|
||||
|
||||
while pq:
|
||||
# Get the node with the smallest distance
|
||||
current_distance, current_node = heapq.heappop(pq)
|
||||
|
||||
# If the current distance is greater than the recorded distance, skip it
|
||||
if current_distance > dist[current_node]:
|
||||
continue
|
||||
|
||||
# Update the distances to the neighboring nodes
|
||||
for neighbor, weight in graph[current_node].items():
|
||||
distance = current_distance + weight
|
||||
# Only consider this new path if it's better
|
||||
if distance < dist[neighbor]:
|
||||
dist[neighbor] = distance
|
||||
heapq.heappush(pq, (distance, neighbor))
|
||||
|
||||
return dist
|
||||
|
||||
# Example usage:
|
||||
graph = {
|
||||
'A': {'B': 1, 'C': 4},
|
||||
'B': {'A': 1, 'C': 2, 'D': 5},
|
||||
'C': {'A': 4, 'B': 2, 'D': 1},
|
||||
'D': {'B': 5, 'C': 1}
|
||||
}
|
||||
|
||||
start_node = 'A'
|
||||
dist = dijkstra(graph, start_node)
|
||||
print(dist)
|
||||
```
|
||||
|
||||
## Complexity Analysis
|
||||
|
||||
- **Time Complexity**: \(O((V + E) log V)\)
|
||||
- **Space Complexity**: \(O(V + E)\)
|
||||
|
||||
|
||||
|
||||
|
|
@ -51,10 +51,6 @@ print(f"The {n}th Fibonacci number is: {fibonacci(n)}.")
|
|||
- **Time Complexity**: O(n) for both approaches
|
||||
- **Space Complexity**: O(n) for the top-down approach (due to memoization), O(1) for the bottom-up approach
|
||||
|
||||
</br>
|
||||
<hr>
|
||||
</br>
|
||||
|
||||
# 2. Longest Common Subsequence
|
||||
|
||||
The longest common subsequence (LCS) problem is to find the longest subsequence common to two sequences. A subsequence is a sequence that appears in the same relative order but not necessarily contiguous.
|
||||
|
@ -84,13 +80,33 @@ Y = "GXTXAYB"
|
|||
print("Length of Longest Common Subsequence:", longest_common_subsequence(X, Y, len(X), len(Y)))
|
||||
```
|
||||
|
||||
## Complexity Analysis
|
||||
- **Time Complexity**: O(m * n) for the top-down approach, where m and n are the lengths of the input sequences
|
||||
- **Space Complexity**: O(m * n) for the memoization table
|
||||
## Longest Common Subsequence Code in Python (Bottom-Up Approach)
|
||||
|
||||
</br>
|
||||
<hr>
|
||||
</br>
|
||||
```python
|
||||
|
||||
def longestCommonSubsequence(X, Y, m, n):
|
||||
L = [[None]*(n+1) for i in range(m+1)]
|
||||
for i in range(m+1):
|
||||
for j in range(n+1):
|
||||
if i == 0 or j == 0:
|
||||
L[i][j] = 0
|
||||
elif X[i-1] == Y[j-1]:
|
||||
L[i][j] = L[i-1][j-1]+1
|
||||
else:
|
||||
L[i][j] = max(L[i-1][j], L[i][j-1])
|
||||
return L[m][n]
|
||||
|
||||
|
||||
S1 = "AGGTAB"
|
||||
S2 = "GXTXAYB"
|
||||
m = len(S1)
|
||||
n = len(S2)
|
||||
print("Length of LCS is", longestCommonSubsequence(S1, S2, m, n))
|
||||
```
|
||||
|
||||
## Complexity Analysis
|
||||
- **Time Complexity**: O(m * n) for both approaches, where m and n are the lengths of the input sequences
|
||||
- **Space Complexity**: O(m * n) for the memoization table
|
||||
|
||||
# 3. 0-1 Knapsack Problem
|
||||
|
||||
|
@ -123,10 +139,98 @@ n = len(weights)
|
|||
print("Maximum value that can be obtained:", knapsack(weights, values, capacity, n))
|
||||
```
|
||||
|
||||
## 0-1 Knapsack Problem Code in Python (Bottom-up Approach)
|
||||
|
||||
```python
|
||||
def knapSack(capacity, weights, values, n):
|
||||
K = [[0 for x in range(capacity + 1)] for x in range(n + 1)]
|
||||
for i in range(n + 1):
|
||||
for w in range(capacity + 1):
|
||||
if i == 0 or w == 0:
|
||||
K[i][w] = 0
|
||||
elif weights[i-1] <= w:
|
||||
K[i][w] = max(values[i-1]
|
||||
+ K[i-1][w-weights[i-1]],
|
||||
K[i-1][w])
|
||||
else:
|
||||
K[i][w] = K[i-1][w]
|
||||
|
||||
return K[n][capacity]
|
||||
|
||||
values = [60, 100, 120]
|
||||
weights = [10, 20, 30]
|
||||
capacity = 50
|
||||
n = len(weights)
|
||||
print(knapSack(capacity, weights, values, n))
|
||||
```
|
||||
|
||||
## Complexity Analysis
|
||||
- **Time Complexity**: O(n * W) for the top-down approach, where n is the number of items and W is the capacity of the knapsack
|
||||
- **Time Complexity**: O(n * W) for both approaches, where n is the number of items and W is the capacity of the knapsack
|
||||
- **Space Complexity**: O(n * W) for the memoization table
|
||||
|
||||
</br>
|
||||
<hr>
|
||||
</br>
|
||||
# 4. Longest Increasing Subsequence
|
||||
|
||||
The Longest Increasing Subsequence (LIS) is a task is to find the longest subsequence that is strictly increasing, meaning each element in the subsequence is greater than the one before it. This subsequence must maintain the order of elements as they appear in the original sequence but does not need to be contiguous. The goal is to identify the subsequence with the maximum possible length.
|
||||
|
||||
**Algorithm Overview:**
|
||||
- **Base cases:** If the sequence is empty, the LIS length is 0.
|
||||
- **Memoization:** Store the results of previously computed subproblems to avoid redundant computations.
|
||||
- **Recurrence relation:** Compute the LIS length by comparing characters of the sequences and making decisions based on their values.
|
||||
|
||||
## Longest Increasing Subsequence Code in Python (Top-Down Approach using Memoization)
|
||||
|
||||
```python
|
||||
import sys
|
||||
|
||||
def f(idx, prev_idx, n, a, dp):
|
||||
if (idx == n):
|
||||
return 0
|
||||
|
||||
if (dp[idx][prev_idx + 1] != -1):
|
||||
return dp[idx][prev_idx + 1]
|
||||
|
||||
notTake = 0 + f(idx + 1, prev_idx, n, a, dp)
|
||||
take = -sys.maxsize - 1
|
||||
if (prev_idx == -1 or a[idx] > a[prev_idx]):
|
||||
take = 1 + f(idx + 1, idx, n, a, dp)
|
||||
|
||||
dp[idx][prev_idx + 1] = max(take, notTake)
|
||||
return dp[idx][prev_idx + 1]
|
||||
|
||||
def longestSubsequence(n, a):
|
||||
|
||||
dp = [[-1 for i in range(n + 1)]for j in range(n + 1)]
|
||||
return f(0, -1, n, a, dp)
|
||||
|
||||
a = [3, 10, 2, 1, 20]
|
||||
n = len(a)
|
||||
|
||||
print("Length of lis is", longestSubsequence(n, a))
|
||||
|
||||
```
|
||||
|
||||
## Longest Increasing Subsequence Code in Python (Bottom-Up Approach)
|
||||
|
||||
```python
|
||||
def lis(arr):
|
||||
n = len(arr)
|
||||
lis = [1]*n
|
||||
|
||||
for i in range(1, n):
|
||||
for j in range(0, i):
|
||||
if arr[i] > arr[j] and lis[i] < lis[j] + 1:
|
||||
lis[i] = lis[j]+1
|
||||
|
||||
maximum = 0
|
||||
for i in range(n):
|
||||
maximum = max(maximum, lis[i])
|
||||
|
||||
return maximum
|
||||
|
||||
arr = [10, 22, 9, 33, 21, 50, 41, 60]
|
||||
print("Length of lis is", lis(arr))
|
||||
```
|
||||
|
||||
## Complexity Analysis
|
||||
- **Time Complexity**: O(n * n) for both approaches, where n is the length of the array.
|
||||
- **Space Complexity**: O(n * n) for the memoization table in Top-Down Approach, O(n) in Bottom-Up Approach.
|
||||
|
|
|
@ -0,0 +1,212 @@
|
|||
# Data Structures: Hash Tables, Hash Sets, and Hash Maps
|
||||
|
||||
## Table of Contents
|
||||
- [Introduction](#introduction)
|
||||
- [Hash Tables](#hash-tables)
|
||||
- [Overview](#overview)
|
||||
- [Operations](#operations)
|
||||
- [Hash Sets](#hash-sets)
|
||||
- [Overview](#overview-1)
|
||||
- [Operations](#operations-1)
|
||||
- [Hash Maps](#hash-maps)
|
||||
- [Overview](#overview-2)
|
||||
- [Operations](#operations-2)
|
||||
- [Conclusion](#conclusion)
|
||||
|
||||
## Introduction
|
||||
This document provides an overview of three fundamental data structures in computer science: hash tables, hash sets, and hash maps. These structures are widely used for efficient data storage and retrieval operations.
|
||||
|
||||
## Hash Tables
|
||||
|
||||
### Overview
|
||||
A **hash table** is a data structure that stores key-value pairs. It uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found.
|
||||
|
||||
### Operations
|
||||
1. **Insertion**: Add a new key-value pair to the hash table.
|
||||
2. **Deletion**: Remove a key-value pair from the hash table.
|
||||
3. **Search**: Find the value associated with a given key.
|
||||
4. **Update**: Modify the value associated with a given key.
|
||||
|
||||
**Example Code (Python):**
|
||||
```python
|
||||
class Node:
|
||||
def __init__(self, key, value):
|
||||
self.key = key
|
||||
self.value = value
|
||||
self.next = None
|
||||
|
||||
|
||||
class HashTable:
|
||||
def __init__(self, capacity):
|
||||
self.capacity = capacity
|
||||
self.size = 0
|
||||
self.table = [None] * capacity
|
||||
|
||||
def _hash(self, key):
|
||||
return hash(key) % self.capacity
|
||||
|
||||
def insert(self, key, value):
|
||||
index = self._hash(key)
|
||||
|
||||
if self.table[index] is None:
|
||||
self.table[index] = Node(key, value)
|
||||
self.size += 1
|
||||
else:
|
||||
current = self.table[index]
|
||||
while current:
|
||||
if current.key == key:
|
||||
current.value = value
|
||||
return
|
||||
current = current.next
|
||||
new_node = Node(key, value)
|
||||
new_node.next = self.table[index]
|
||||
self.table[index] = new_node
|
||||
self.size += 1
|
||||
|
||||
def search(self, key):
|
||||
index = self._hash(key)
|
||||
|
||||
current = self.table[index]
|
||||
while current:
|
||||
if current.key == key:
|
||||
return current.value
|
||||
current = current.next
|
||||
|
||||
raise KeyError(key)
|
||||
|
||||
def remove(self, key):
|
||||
index = self._hash(key)
|
||||
|
||||
previous = None
|
||||
current = self.table[index]
|
||||
|
||||
while current:
|
||||
if current.key == key:
|
||||
if previous:
|
||||
previous.next = current.next
|
||||
else:
|
||||
self.table[index] = current.next
|
||||
self.size -= 1
|
||||
return
|
||||
previous = current
|
||||
current = current.next
|
||||
|
||||
raise KeyError(key)
|
||||
|
||||
def __len__(self):
|
||||
return self.size
|
||||
|
||||
def __contains__(self, key):
|
||||
try:
|
||||
self.search(key)
|
||||
return True
|
||||
except KeyError:
|
||||
return False
|
||||
|
||||
|
||||
# Driver code
|
||||
if __name__ == '__main__':
|
||||
|
||||
ht = HashTable(5)
|
||||
|
||||
ht.insert("apple", 3)
|
||||
ht.insert("banana", 2)
|
||||
ht.insert("cherry", 5)
|
||||
|
||||
|
||||
print("apple" in ht)
|
||||
print("durian" in ht)
|
||||
|
||||
print(ht.search("banana"))
|
||||
|
||||
ht.insert("banana", 4)
|
||||
print(ht.search("banana")) # 4
|
||||
|
||||
ht.remove("apple")
|
||||
|
||||
print(len(ht)) # 3
|
||||
```
|
||||
|
||||
# Insert elements
|
||||
hash_table["key1"] = "value1"
|
||||
hash_table["key2"] = "value2"
|
||||
|
||||
# Search for an element
|
||||
value = hash_table.get("key1")
|
||||
|
||||
# Delete an element
|
||||
del hash_table["key2"]
|
||||
|
||||
# Update an element
|
||||
hash_table["key1"] = "new_value1"
|
||||
|
||||
## Hash Sets
|
||||
|
||||
### Overview
|
||||
A **hash set** is a collection of unique elements. It is implemented using a hash table where each bucket can store only one element.
|
||||
|
||||
### Operations
|
||||
1. **Insertion**: Add a new element to the set.
|
||||
2. **Deletion**: Remove an element from the set.
|
||||
3. **Search**: Check if an element exists in the set.
|
||||
4. **Union**: Combine two sets to form a new set with elements from both.
|
||||
5. **Intersection**: Find common elements between two sets.
|
||||
6. **Difference**: Find elements present in one set but not in the other.
|
||||
|
||||
**Example Code (Python):**
|
||||
```python
|
||||
# Create a hash set
|
||||
hash_set = set()
|
||||
|
||||
# Insert elements
|
||||
hash_set.add("element1")
|
||||
hash_set.add("element2")
|
||||
|
||||
# Search for an element
|
||||
exists = "element1" in hash_set
|
||||
|
||||
# Delete an element
|
||||
hash_set.remove("element2")
|
||||
|
||||
# Union of sets
|
||||
another_set = {"element3", "element4"}
|
||||
union_set = hash_set.union(another_set)
|
||||
|
||||
# Intersection of sets
|
||||
intersection_set = hash_set.intersection(another_set)
|
||||
|
||||
# Difference of sets
|
||||
difference_set = hash_set.difference(another_set)
|
||||
```
|
||||
## Hash Maps
|
||||
|
||||
### Overview
|
||||
A **hash map** is similar to a hash table but often provides additional functionalities and more user-friendly interfaces for developers. It is a collection of key-value pairs where each key is unique.
|
||||
|
||||
### Operations
|
||||
1. **Insertion**: Add a new key-value pair to the hash map.
|
||||
2. **Deletion**: Remove a key-value pair from the hash map.
|
||||
3. **Search**: Retrieve the value associated with a given key.
|
||||
4. **Update**: Change the value associated with a given key.
|
||||
|
||||
**Example Code (Python):**
|
||||
```python
|
||||
# Create a hash map
|
||||
hash_map = {}
|
||||
|
||||
# Insert elements
|
||||
hash_map["key1"] = "value1"
|
||||
hash_map["key2"] = "value2"
|
||||
|
||||
# Search for an element
|
||||
value = hash_map.get("key1")
|
||||
|
||||
# Delete an element
|
||||
del hash_map["key2"]
|
||||
|
||||
# Update an element
|
||||
hash_map["key1"] = "new_value1"
|
||||
|
||||
```
|
||||
## Conclusion
|
||||
Hash tables, hash sets, and hash maps are powerful data structures that provide efficient means of storing and retrieving data. Understanding these structures and their operations is crucial for developing optimized algorithms and applications.
|
|
@ -0,0 +1,153 @@
|
|||
# Hashing with Chaining
|
||||
|
||||
In Data Structures and Algorithms, hashing is used to map data of arbitrary size to fixed-size values. A common approach to handle collisions in hashing is **chaining**. In chaining, each slot of the hash table contains a linked list, and all elements that hash to the same slot are stored in that list.
|
||||
|
||||
## Points to be Remembered
|
||||
|
||||
- **Hash Function**: A function that converts an input (or 'key') into an index in a hash table.
|
||||
- **Collision**: When two keys hash to the same index.
|
||||
- **Chaining**: A method to resolve collisions by maintaining a linked list for each hash table slot.
|
||||
|
||||
## Real Life Examples of Hashing with Chaining
|
||||
|
||||
- **Phone Directory**: Contacts are stored in a hash table where the contact's name is hashed to an index. If multiple names hash to the same index, they are stored in a linked list at that index.
|
||||
- **Library Catalog**: Books are indexed by their titles. If multiple books have titles that hash to the same index, they are stored in a linked list at that index.
|
||||
|
||||
## Applications of Hashing
|
||||
|
||||
Hashing is widely used in Computer Science:
|
||||
|
||||
- **Database Indexing**
|
||||
- **Caches** (like CPU caches, web caches)
|
||||
- **Associative Arrays** (or dictionaries in Python)
|
||||
- **Sets** (unordered collections of unique elements)
|
||||
|
||||
Understanding these applications is essential for Software Development.
|
||||
|
||||
## Operations in Hash Table with Chaining
|
||||
|
||||
Key operations include:
|
||||
|
||||
- **INSERT**: Insert a new element into the hash table.
|
||||
- **SEARCH**: Find the position of an element in the hash table.
|
||||
- **DELETE**: Remove an element from the hash table.
|
||||
|
||||
## Implementing Hash Table with Chaining in Python
|
||||
|
||||
```python
|
||||
class Node:
|
||||
def __init__(self, key, value):
|
||||
self.key = key
|
||||
self.value = value
|
||||
self.next = None
|
||||
|
||||
class HashTable:
|
||||
def __init__(self, size):
|
||||
self.size = size
|
||||
self.table = [None] * size
|
||||
|
||||
def hash_function(self, key):
|
||||
return key % self.size
|
||||
|
||||
def insert(self, key, value):
|
||||
hash_index = self.hash_function(key)
|
||||
new_node = Node(key, value)
|
||||
|
||||
if self.table[hash_index] is None:
|
||||
self.table[hash_index] = new_node
|
||||
else:
|
||||
current = self.table[hash_index]
|
||||
while current.next is not None:
|
||||
current = current.next
|
||||
current.next = new_node
|
||||
|
||||
def search(self, key):
|
||||
hash_index = self.hash_function(key)
|
||||
current = self.table[hash_index]
|
||||
|
||||
while current is not None:
|
||||
if current.key == key:
|
||||
return current.value
|
||||
current = current.next
|
||||
|
||||
return None
|
||||
|
||||
def delete(self, key):
|
||||
hash_index = self.hash_function(key)
|
||||
current = self.table[hash_index]
|
||||
prev = None
|
||||
|
||||
while current is not None:
|
||||
if current.key == key:
|
||||
if prev is None:
|
||||
self.table[hash_index] = current.next
|
||||
else:
|
||||
prev.next = current.next
|
||||
return True
|
||||
prev = current
|
||||
current = current.next
|
||||
|
||||
return False
|
||||
|
||||
def display(self):
|
||||
for index, item in enumerate(self.table):
|
||||
print(f"Index {index}:", end=" ")
|
||||
current = item
|
||||
while current is not None:
|
||||
print(f"({current.key}, {current.value})", end=" -> ")
|
||||
current = current.next
|
||||
print("None")
|
||||
|
||||
# Example usage
|
||||
hash_table = HashTable(10)
|
||||
|
||||
hash_table.insert(1, 'A')
|
||||
hash_table.insert(11, 'B')
|
||||
hash_table.insert(21, 'C')
|
||||
|
||||
print("Hash Table after Insert operations:")
|
||||
hash_table.display()
|
||||
|
||||
print("Search operation for key 11:", hash_table.search(11))
|
||||
|
||||
hash_table.delete(11)
|
||||
|
||||
print("Hash Table after Delete operation:")
|
||||
hash_table.display()
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
```markdown
|
||||
Hash Table after Insert operations:
|
||||
Index 0: None
|
||||
Index 1: (1, 'A') -> (11, 'B') -> (21, 'C') -> None
|
||||
Index 2: None
|
||||
Index 3: None
|
||||
Index 4: None
|
||||
Index 5: None
|
||||
Index 6: None
|
||||
Index 7: None
|
||||
Index 8: None
|
||||
Index 9: None
|
||||
|
||||
Search operation for key 11: B
|
||||
|
||||
Hash Table after Delete operation:
|
||||
Index 0: None
|
||||
Index 1: (1, 'A') -> (21, 'C') -> None
|
||||
Index 2: None
|
||||
Index 3: None
|
||||
Index 4: None
|
||||
Index 5: None
|
||||
Index 6: None
|
||||
Index 7: None
|
||||
Index 8: None
|
||||
Index 9: None
|
||||
```
|
||||
|
||||
## Complexity Analysis
|
||||
|
||||
- **Insertion**: Average case O(1), Worst case O(n) when many elements hash to the same slot.
|
||||
- **Search**: Average case O(1), Worst case O(n) when many elements hash to the same slot.
|
||||
- **Deletion**: Average case O(1), Worst case O(n) when many elements hash to the same slot.
|
|
@ -0,0 +1,139 @@
|
|||
# Hashing with Linear Probing
|
||||
|
||||
In Data Structures and Algorithms, hashing is used to map data of arbitrary size to fixed-size values. A common approach to handle collisions in hashing is **linear probing**. In linear probing, if a collision occurs (i.e., the hash value points to an already occupied slot), we linearly probe through the table to find the next available slot. This method ensures that every element can be inserted or found in the hash table.
|
||||
|
||||
## Points to be Remembered
|
||||
|
||||
- **Hash Function**: A function that converts an input (or 'key') into an index in a hash table.
|
||||
- **Collision**: When two keys hash to the same index.
|
||||
- **Linear Probing**: A method to resolve collisions by checking the next slot (i.e., index + 1) until an empty slot is found.
|
||||
|
||||
## Real Life Examples of Hashing with Linear Probing
|
||||
|
||||
- **Student Record System**: Each student record is stored in a table where the student's ID number is hashed to an index. If two students have the same hash index, linear probing finds the next available slot.
|
||||
- **Library System**: Books are indexed by their ISBN numbers. If two books hash to the same slot, linear probing helps find another spot for the book in the catalog.
|
||||
|
||||
## Applications of Hashing
|
||||
|
||||
Hashing is widely used in Computer Science:
|
||||
|
||||
- **Database Indexing**
|
||||
- **Caches** (like CPU caches, web caches)
|
||||
- **Associative Arrays** (or dictionaries in Python)
|
||||
- **Sets** (unordered collections of unique elements)
|
||||
|
||||
Understanding these applications is essential for Software Development.
|
||||
|
||||
## Operations in Hash Table with Linear Probing
|
||||
|
||||
Key operations include:
|
||||
|
||||
- **INSERT**: Insert a new element into the hash table.
|
||||
- **SEARCH**: Find the position of an element in the hash table.
|
||||
- **DELETE**: Remove an element from the hash table.
|
||||
|
||||
## Implementing Hash Table with Linear Probing in Python
|
||||
|
||||
```python
|
||||
class HashTable:
|
||||
def __init__(self, size):
|
||||
self.size = size
|
||||
self.table = [None] * size
|
||||
|
||||
def hash_function(self, key):
|
||||
return key % self.size
|
||||
|
||||
def insert(self, key, value):
|
||||
hash_index = self.hash_function(key)
|
||||
|
||||
if self.table[hash_index] is None:
|
||||
self.table[hash_index] = (key, value)
|
||||
else:
|
||||
while self.table[hash_index] is not None:
|
||||
hash_index = (hash_index + 1) % self.size
|
||||
self.table[hash_index] = (key, value)
|
||||
|
||||
def search(self, key):
|
||||
hash_index = self.hash_function(key)
|
||||
|
||||
while self.table[hash_index] is not None:
|
||||
if self.table[hash_index][0] == key:
|
||||
return self.table[hash_index][1]
|
||||
hash_index = (hash_index + 1) % self.size
|
||||
|
||||
return None
|
||||
|
||||
def delete(self, key):
|
||||
hash_index = self.hash_function(key)
|
||||
|
||||
while self.table[hash_index] is not None:
|
||||
if self.table[hash_index][0] == key:
|
||||
self.table[hash_index] = None
|
||||
return True
|
||||
hash_index = (hash_index + 1) % self.size
|
||||
|
||||
return False
|
||||
|
||||
def display(self):
|
||||
for index, item in enumerate(self.table):
|
||||
print(f"Index {index}: {item}")
|
||||
|
||||
# Example usage
|
||||
hash_table = HashTable(10)
|
||||
|
||||
hash_table.insert(1, 'A')
|
||||
hash_table.insert(11, 'B')
|
||||
hash_table.insert(21, 'C')
|
||||
|
||||
print("Hash Table after Insert operations:")
|
||||
hash_table.display()
|
||||
|
||||
print("Search operation for key 11:", hash_table.search(11))
|
||||
|
||||
hash_table.delete(11)
|
||||
|
||||
print("Hash Table after Delete operation:")
|
||||
hash_table.display()
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
```markdown
|
||||
Hash Table after Insert operations:
|
||||
Index 0: None
|
||||
Index 1: (1, 'A')
|
||||
Index 2: None
|
||||
Index 3: None
|
||||
Index 4: None
|
||||
Index 5: None
|
||||
Index 6: None
|
||||
Index 7: None
|
||||
Index 8: None
|
||||
Index 9: None
|
||||
Index 10: None
|
||||
Index 11: (11, 'B')
|
||||
Index 12: (21, 'C')
|
||||
|
||||
Search operation for key 11: B
|
||||
|
||||
Hash Table after Delete operation:
|
||||
Index 0: None
|
||||
Index 1: (1, 'A')
|
||||
Index 2: None
|
||||
Index 3: None
|
||||
Index 4: None
|
||||
Index 5: None
|
||||
Index 6: None
|
||||
Index 7: None
|
||||
Index 8: None
|
||||
Index 9: None
|
||||
Index 10: None
|
||||
Index 11: None
|
||||
Index 12: (21, 'C')
|
||||
```
|
||||
|
||||
## Complexity Analysis
|
||||
|
||||
- **Insertion**: Average case O(1), Worst case O(n) when many collisions occur.
|
||||
- **Search**: Average case O(1), Worst case O(n) when many collisions occur.
|
||||
- **Deletion**: Average case O(1), Worst case O(n) when many collisions occur.
|
|
@ -0,0 +1,169 @@
|
|||
# Heaps
|
||||
|
||||
## Definition:
|
||||
Heaps are a crucial data structure that support efficient priority queue operations. They come in two main types: min heaps and max heaps. Python's heapq module provides a robust implementation for min heaps, and with some minor adjustments, it can also be used to implement max heaps.
|
||||
|
||||
## Overview:
|
||||
A heap is a specialized binary tree-based data structure that satisfies the heap property:
|
||||
|
||||
- **Min Heap:** The key at the root must be the minimum among all keys present in the Binary Heap. This property must be recursively true for all nodes in the Binary Tree.
|
||||
|
||||
- **Max Heap:** The key at the root must be the maximum among all keys present in the Binary Heap. This property must be recursively true for all nodes in the Binary Tree.
|
||||
|
||||
## Python heapq Module:
|
||||
The heapq module provides an implementation of the heap queue algorithm, also known as the priority queue algorithm.
|
||||
|
||||
- **Min Heap:** In a min heap, the smallest element is always at the root. Here's how to use heapq to create and manipulate a min heap:
|
||||
|
||||
```python
|
||||
import heapq
|
||||
|
||||
# Create an empty heap
|
||||
min_heap = []
|
||||
|
||||
# Adding elements to the heap
|
||||
|
||||
heapq.heappush(min_heap, 10)
|
||||
heapq.heappush(min_heap, 5)
|
||||
heapq.heappush(min_heap, 3)
|
||||
heapq.heappush(min_heap, 12)
|
||||
print("Min Heap:", min_heap)
|
||||
|
||||
# Pop the smallest element
|
||||
smallest = heapq.heappop(min_heap)
|
||||
print("Smallest element:", smallest)
|
||||
print("Min Heap after pop:", min_heap)
|
||||
```
|
||||
|
||||
**Output:**
|
||||
|
||||
```
|
||||
Min Heap: [3, 5, 10, 12]
|
||||
Smallest element: 3
|
||||
Min Heap after pop: [5, 12, 10]
|
||||
```
|
||||
|
||||
- **Max Heap:** To create a max heap, we can store negative values.
|
||||
|
||||
```python
|
||||
import heapq
|
||||
|
||||
# Create an empty heap
|
||||
max_heap = []
|
||||
|
||||
# Adding elements to the heap by pushing negative values
|
||||
heapq.heappush(max_heap, -10)
|
||||
heapq.heappush(max_heap, -5)
|
||||
heapq.heappush(max_heap, -3)
|
||||
heapq.heappush(max_heap, -12)
|
||||
|
||||
# Convert back to positive values for display
|
||||
print("Max Heap:", [-x for x in max_heap])
|
||||
|
||||
# Pop the largest element
|
||||
largest = -heapq.heappop(max_heap)
|
||||
print("Largest element:", largest)
|
||||
print("Max Heap after pop:", [-x for x in max_heap])
|
||||
|
||||
```
|
||||
|
||||
**Output:**
|
||||
|
||||
```
|
||||
Max Heap: [12, 10, 3, 5]
|
||||
Largest element: 12
|
||||
Max Heap after pop: [10, 5, 3]
|
||||
```
|
||||
|
||||
## Heap Operations:
|
||||
1. **Push Operation:** Adds an element to the heap, maintaining the heap property.
|
||||
```python
|
||||
heapq.heappush(heap, item)
|
||||
```
|
||||
2. **Pop Operation:** Removes and returns the smallest element from the heap.
|
||||
```python
|
||||
smallest = heapq.heappop(heap)
|
||||
```
|
||||
3. **Heapify Operation:** Converts a list into a heap in-place.
|
||||
```python
|
||||
heapq.heapify(list)
|
||||
```
|
||||
4. **Peek Operation:** To get the smallest element without popping it (not directly available, but can be done by accessing the first element).
|
||||
```python
|
||||
smallest = heap[0]
|
||||
```
|
||||
|
||||
## Example:
|
||||
```python
|
||||
# importing "heapq" to implement heap queue
|
||||
import heapq
|
||||
|
||||
# initializing list
|
||||
li = [15, 77, 90, 1, 3]
|
||||
|
||||
# using heapify to convert list into heap
|
||||
heapq.heapify(li)
|
||||
|
||||
# printing created heap
|
||||
print("The created heap is : ", end="")
|
||||
print(list(li))
|
||||
|
||||
# using heappush() to push elements into heap
|
||||
# pushes 4
|
||||
heapq.heappush(li, 4)
|
||||
|
||||
# printing modified heap
|
||||
print("The modified heap after push is : ", end="")
|
||||
print(list(li))
|
||||
|
||||
# using heappop() to pop smallest element
|
||||
print("The popped and smallest element is : ", end="")
|
||||
print(heapq.heappop(li))
|
||||
|
||||
```
|
||||
|
||||
Output:
|
||||
```
|
||||
The created heap is : [1, 3, 15, 77, 90]
|
||||
The modified heap after push is : [1, 3, 4, 15, 77, 90]
|
||||
The popped and smallest element is : 1
|
||||
```
|
||||
|
||||
## Advantages and Disadvantages of Heaps:
|
||||
|
||||
## Advantages:
|
||||
|
||||
**Efficient:** Heap queues, implemented in Python's heapq module, offer remarkable efficiency in managing priority queues and heaps. With logarithmic time complexity for key operations, they are widely favored in various applications for their performance.
|
||||
|
||||
**Space-efficient:** Leveraging an array-based representation, heap queues optimize memory usage compared to node-based structures like linked lists. This design minimizes overhead, enhancing efficiency in memory management.
|
||||
|
||||
**Ease of Use:** Python's heap queues boast a user-friendly API, simplifying fundamental operations such as insertion, deletion, and retrieval. This simplicity contributes to rapid development and code maintenance.
|
||||
|
||||
**Flexibility:** Beyond their primary use in priority queues and heaps, Python's heap queues lend themselves to diverse applications. They can be adapted to implement various data structures, including binary trees, showcasing their versatility and broad utility across different domains.
|
||||
|
||||
## Disadvantages:
|
||||
|
||||
**Limited functionality:** Heap queues are primarily designed for managing priority queues and heaps, and may not be suitable for more complex data structures and algorithms.
|
||||
|
||||
**No random access:** Heap queues do not support random access to elements, making it difficult to access elements in the middle of the heap or modify elements that are not at the top of the heap.
|
||||
|
||||
**No sorting:** Heap queues do not support sorting, so if you need to sort elements in a specific order, you will need to use a different data structure or algorithm.
|
||||
|
||||
**Not thread-safe:** Heap queues are not thread-safe, meaning that they may not be suitable for use in multi-threaded applications where data synchronization is critical.
|
||||
|
||||
## Real-Life Examples of Heaps:
|
||||
|
||||
1. **Priority Queues:**
|
||||
Heaps are commonly used to implement priority queues, which are used in various algorithms like Dijkstra's shortest path algorithm and Prim's minimum spanning tree algorithm.
|
||||
|
||||
2. **Scheduling Algorithms:**
|
||||
Heaps are used in job scheduling algorithms where tasks with the highest priority need to be processed first.
|
||||
|
||||
3. **Merge K Sorted Lists:**
|
||||
Heaps can be used to efficiently merge multiple sorted lists into a single sorted list.
|
||||
|
||||
4. **Real-Time Event Simulation:**
|
||||
Heaps are used in event-driven simulators to manage events scheduled to occur at future times.
|
||||
|
||||
5. **Median Finding Algorithm:**
|
||||
Heaps can be used to maintain a dynamic set of numbers to find the median efficiently.
|
Po Szerokość: | Wysokość: | Rozmiar: 36 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 31 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 29 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 33 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 34 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 27 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 26 KiB |
|
@ -13,3 +13,12 @@
|
|||
- [Stacks in Python](stacks.md)
|
||||
- [Sliding Window Technique](sliding-window.md)
|
||||
- [Trie](trie.md)
|
||||
- [Two Pointer Technique](two-pointer-technique.md)
|
||||
- [Hashing through Linear Probing](hashing-linear-probing.md)
|
||||
- [Hashing through Chaining](hashing-chaining.md)
|
||||
- [Heaps](heaps.md)
|
||||
- [Hash Tables, Sets, Maps](hash-tables.md)
|
||||
- [Binary Tree](binary-tree.md)
|
||||
- [AVL Trees](avl-trees.md)
|
||||
- [Splay Trees](splay-trees.md)
|
||||
- [Dijkstra's Algorithm](dijkstra.md)
|
||||
|
|
|
@ -465,3 +465,99 @@ print("Sorted array:", arr) # Output: [1, 2, 3, 5, 8]
|
|||
- **Worst Case:** `𝑂(𝑛log𝑛)`. Building the heap takes `𝑂(𝑛)` time, and each of the 𝑛 element extractions takes `𝑂(log𝑛)` time.
|
||||
- **Best Case:** `𝑂(𝑛log𝑛)`. Even if the array is already sorted, heap sort will still build the heap and perform the extractions.
|
||||
- **Average Case:** `𝑂(𝑛log𝑛)`. Similar to the worst-case, the overall complexity remains `𝑂(𝑛log𝑛)` because each insertion and deletion in a heap takes `𝑂(log𝑛)` time, and these operations are performed 𝑛 times.
|
||||
|
||||
## 7. Radix Sort
|
||||
Radix Sort is a non-comparative integer sorting algorithm that sorts numbers by processing individual digits. It processes digits from the least significant digit (LSD) to the most significant digit (MSD) or vice versa. This algorithm is efficient for sorting numbers with a fixed number of digits.
|
||||
|
||||
**Algorithm Overview:**
|
||||
- **Digit by Digit sorting:** Radix sort processes the digits of the numbers starting from either the least significant digit (LSD) or the most significant digit (MSD). Typically, LSD is used.
|
||||
- **Stable Sort:** A stable sorting algorithm like Counting Sort or Bucket Sort is used as an intermediate sorting technique. Radix Sort relies on this stability to maintain the relative order of numbers with the same digit value.
|
||||
- **Multiple passes:** The algorithm performs multiple passes over the numbers, one for each digit, from the least significant to the most significant.
|
||||
|
||||
### Radix Sort Code in Python
|
||||
|
||||
```python
|
||||
def counting_sort(arr, exp):
|
||||
n = len(arr)
|
||||
output = [0] * n
|
||||
count = [0] * 10
|
||||
|
||||
for i in range(n):
|
||||
index = arr[i] // exp
|
||||
count[index % 10] += 1
|
||||
|
||||
for i in range(1, 10):
|
||||
count[i] += count[i - 1]
|
||||
|
||||
i = n - 1
|
||||
while i >= 0:
|
||||
index = arr[i] // exp
|
||||
output[count[index % 10] - 1] = arr[i]
|
||||
count[index % 10] -= 1
|
||||
i -= 1
|
||||
|
||||
for i in range(n):
|
||||
arr[i] = output[i]
|
||||
|
||||
def radix_sort(arr):
|
||||
max_num = max(arr)
|
||||
exp = 1
|
||||
while max_num // exp > 0:
|
||||
counting_sort(arr, exp)
|
||||
exp *= 10
|
||||
|
||||
# Example usage
|
||||
arr = [170, 45, 75, 90]
|
||||
print("Original array:", arr)
|
||||
radix_sort(arr)
|
||||
print("Sorted array:", arr)
|
||||
```
|
||||
|
||||
### Complexity Analysis
|
||||
- **Time Complexity:** O(d * (n + k)) for all cases. Radix Sort always processes each digit of every number in the array.
|
||||
- **Space Complexity:** O(n + k). This is due to the space required for:
|
||||
- The output array used in Counting Sort, which is of size n.
|
||||
- The count array used in Counting Sort, which is of size k.
|
||||
|
||||
## 8. Counting Sort
|
||||
Counting sort is a sorting technique based on keys between a specific range. It works by counting the number of objects having distinct key values (kind of hashing). Then do some arithmetic to calculate the position of each object in the output sequence.
|
||||
|
||||
**Algorithm Overview:**
|
||||
- Convert the input string into a list of characters.
|
||||
- Count the occurrence of each character in the list using the collections.Counter() method.
|
||||
- Sort the keys of the resulting Counter object to get the unique characters in the list in sorted order.
|
||||
- For each character in the sorted list of keys, create a list of repeated characters using the corresponding count from the Counter object.
|
||||
- Concatenate the lists of repeated characters to form the sorted output list.
|
||||
|
||||
|
||||
### Counting Sort Code in Python using counter method.
|
||||
|
||||
```python
|
||||
from collections import Counter
|
||||
|
||||
def counting_sort(arr):
|
||||
count = Counter(arr)
|
||||
output = []
|
||||
for c in sorted(count.keys()):
|
||||
output += * count
|
||||
return output
|
||||
|
||||
arr = "geeksforgeeks"
|
||||
arr = list(arr)
|
||||
arr = counting_sort(arr)
|
||||
output = ''.join(arr)
|
||||
print("Sorted character array is", output)
|
||||
|
||||
```
|
||||
### Counting Sort Code in Python using sorted() and reduce():
|
||||
|
||||
```python
|
||||
from functools import reduce
|
||||
string = "geeksforgeeks"
|
||||
sorted_str = reduce(lambda x, y: x+y, sorted(string))
|
||||
print("Sorted string:", sorted_str)
|
||||
```
|
||||
|
||||
### Complexity Analysis
|
||||
- **Time Complexity:** O(n+k) for all cases.No matter how the elements are placed in the array, the algorithm goes through n+k times
|
||||
- **Space Complexity:** O(max). Larger the range of elements, larger is the space complexity.
|
||||
|
|
|
@ -0,0 +1,162 @@
|
|||
# Splay Tree
|
||||
|
||||
In Data Structures and Algorithms, a **Splay Tree** is a self-adjusting binary search tree with the additional property that recently accessed elements are quick to access again. It performs basic operations such as insertion, search, and deletion in O(log n) amortized time. This is achieved by a process called **splaying**, where the accessed node is moved to the root through a series of tree rotations.
|
||||
|
||||
## Points to be Remembered
|
||||
|
||||
- **Splaying**: Moving the accessed node to the root using rotations.
|
||||
- **Rotations**: Tree rotations (left and right) are used to balance the tree during splaying.
|
||||
- **Self-adjusting**: The tree adjusts itself with each access, keeping frequently accessed nodes near the root.
|
||||
|
||||
## Real Life Examples of Splay Trees
|
||||
|
||||
- **Cache Implementation**: Frequently accessed data is kept near the top of the tree, making repeated accesses faster.
|
||||
- **Networking**: Routing tables in network switches can use splay trees to prioritize frequently accessed routes.
|
||||
|
||||
## Applications of Splay Trees
|
||||
|
||||
Splay trees are used in various applications in Computer Science:
|
||||
|
||||
- **Cache Implementations**
|
||||
- **Garbage Collection Algorithms**
|
||||
- **Data Compression Algorithms (e.g., LZ78)**
|
||||
|
||||
Understanding these applications is essential for Software Development.
|
||||
|
||||
## Operations in Splay Tree
|
||||
|
||||
Key operations include:
|
||||
|
||||
- **INSERT**: Insert a new element into the splay tree.
|
||||
- **SEARCH**: Find the position of an element in the splay tree.
|
||||
- **DELETE**: Remove an element from the splay tree.
|
||||
|
||||
## Implementing Splay Tree in Python
|
||||
|
||||
```python
|
||||
class SplayTreeNode:
|
||||
def __init__(self, key):
|
||||
self.key = key
|
||||
self.left = None
|
||||
self.right = None
|
||||
|
||||
class SplayTree:
|
||||
def __init__(self):
|
||||
self.root = None
|
||||
|
||||
def insert(self, key):
|
||||
self.root = self.splay_insert(self.root, key)
|
||||
|
||||
def search(self, key):
|
||||
self.root = self.splay_search(self.root, key)
|
||||
return self.root
|
||||
|
||||
def splay(self, root, key):
|
||||
if not root or root.key == key:
|
||||
return root
|
||||
|
||||
if root.key > key:
|
||||
if not root.left:
|
||||
return root
|
||||
if root.left.key > key:
|
||||
root.left.left = self.splay(root.left.left, key)
|
||||
root = self.rotateRight(root)
|
||||
elif root.left.key < key:
|
||||
root.left.right = self.splay(root.left.right, key)
|
||||
if root.left.right:
|
||||
root.left = self.rotateLeft(root.left)
|
||||
return root if not root.left else self.rotateRight(root)
|
||||
|
||||
else:
|
||||
if not root.right:
|
||||
return root
|
||||
if root.right.key > key:
|
||||
root.right.left = self.splay(root.right.left, key)
|
||||
if root.right.left:
|
||||
root.right = self.rotateRight(root.right)
|
||||
elif root.right.key < key:
|
||||
root.right.right = self.splay(root.right.right, key)
|
||||
root = self.rotateLeft(root)
|
||||
return root if not root.right else self.rotateLeft(root)
|
||||
|
||||
def splay_insert(self, root, key):
|
||||
if not root:
|
||||
return SplayTreeNode(key)
|
||||
|
||||
root = self.splay(root, key)
|
||||
|
||||
if root.key == key:
|
||||
return root
|
||||
|
||||
new_node = SplayTreeNode(key)
|
||||
|
||||
if root.key > key:
|
||||
new_node.right = root
|
||||
new_node.left = root.left
|
||||
root.left = None
|
||||
else:
|
||||
new_node.left = root
|
||||
new_node.right = root.right
|
||||
root.right = None
|
||||
|
||||
return new_node
|
||||
|
||||
def splay_search(self, root, key):
|
||||
return self.splay(root, key)
|
||||
|
||||
def rotateRight(self, node):
|
||||
temp = node.left
|
||||
node.left = temp.right
|
||||
temp.right = node
|
||||
return temp
|
||||
|
||||
def rotateLeft(self, node):
|
||||
temp = node.right
|
||||
node.right = temp.left
|
||||
temp.left = node
|
||||
return temp
|
||||
|
||||
def preOrder(self, root):
|
||||
if root:
|
||||
print(root.key, end=' ')
|
||||
self.preOrder(root.left)
|
||||
self.preOrder(root.right)
|
||||
|
||||
#Example usage:
|
||||
splay_tree = SplayTree()
|
||||
splay_tree.insert(50)
|
||||
splay_tree.insert(30)
|
||||
splay_tree.insert(20)
|
||||
splay_tree.insert(40)
|
||||
splay_tree.insert(70)
|
||||
splay_tree.insert(60)
|
||||
splay_tree.insert(80)
|
||||
|
||||
print("Preorder traversal of the Splay tree is:")
|
||||
splay_tree.preOrder(splay_tree.root)
|
||||
|
||||
splay_tree.search(60)
|
||||
|
||||
print("\nSplay tree after search operation for key 60:")
|
||||
splay_tree.preOrder(splay_tree.root)
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
```markdown
|
||||
Preorder traversal of the Splay tree is:
|
||||
50 30 20 40 70 60 80
|
||||
|
||||
Splay tree after search operation for key 60:
|
||||
60 50 30 20 40 70 80
|
||||
```
|
||||
|
||||
## Complexity Analysis
|
||||
|
||||
The worst-case time complexities of the main operations in a Splay Tree are as follows:
|
||||
|
||||
- **Insertion**: (O(n)). In the worst case, insertion may take linear time if the tree is highly unbalanced.
|
||||
- **Search**: (O(n)). In the worst case, searching for a node may take linear time if the tree is highly unbalanced.
|
||||
- **Deletion**: (O(n)). In the worst case, deleting a node may take linear time if the tree is highly unbalanced.
|
||||
|
||||
While these operations can take linear time in the worst case, the splay operation ensures that the tree remains balanced over a sequence of operations, leading to better average-case performance.
|
|
@ -0,0 +1,132 @@
|
|||
# Two-Pointer Technique
|
||||
|
||||
---
|
||||
|
||||
- The two-pointer technique is a popular algorithmic strategy used to solve various problems efficiently. This technique involves using two pointers (or indices) to traverse through data structures such as arrays or linked lists.
|
||||
- The pointers can move in different directions, allowing for efficient processing of elements to achieve the desired results.
|
||||
|
||||
## Common Use Cases
|
||||
|
||||
1. **Finding pairs in a sorted array that sum to a target**: One pointer starts at the beginning and the other at the end.
|
||||
2. **Reversing a linked list**: One pointer starts at the head, and the other at the next node, progressing through the list.
|
||||
3. **Removing duplicates from a sorted array**: One pointer keeps track of the unique elements, and the other traverses the array.
|
||||
4. **Merging two sorted arrays**: Two pointers are used to iterate through the arrays and merge them.
|
||||
|
||||
## Example 1: Finding Pairs with a Given Sum
|
||||
|
||||
### Problem Statement
|
||||
|
||||
Given a sorted array of integers and a target sum, find all pairs in the array that sum up to the target.
|
||||
|
||||
### Approach
|
||||
|
||||
1. Initialize two pointers: one at the beginning (`left`) and one at the end (`right`) of the array.
|
||||
2. Calculate the sum of the elements at the `left` and `right` pointers.
|
||||
3. If the sum is equal to the target, record the pair and move both pointers inward.
|
||||
4. If the sum is less than the target, move the `left` pointer to the right to increase the sum.
|
||||
5. If the sum is greater than the target, move the `right` pointer to the left to decrease the sum.
|
||||
6. Repeat the process until the `left` pointer is not less than the `right` pointer.
|
||||
|
||||
### Example Code
|
||||
|
||||
```python
|
||||
def find_pairs_with_sum(arr, target):
|
||||
left = 0
|
||||
right = len(arr) - 1
|
||||
pairs = []
|
||||
|
||||
while left < right:
|
||||
current_sum = arr[left] + arr[right]
|
||||
|
||||
if current_sum == target:
|
||||
pairs.append((arr[left], arr[right]))
|
||||
left += 1
|
||||
right -= 1
|
||||
elif current_sum < target:
|
||||
left += 1
|
||||
else:
|
||||
right -= 1
|
||||
|
||||
return pairs
|
||||
|
||||
# Example usage
|
||||
arr = [1, 2, 3, 4, 5, 6, 7, 8, 9]
|
||||
target = 10
|
||||
result = find_pairs_with_sum(arr, target)
|
||||
print("Pairs with sum", target, "are:", result)
|
||||
```
|
||||
|
||||
## Example 2: Removing Duplicates from a Sorted Array
|
||||
|
||||
### Problem Statement
|
||||
Given a sorted array, remove the duplicates in place such that each element appears only once and return the new length of the array.
|
||||
|
||||
### Approach
|
||||
1. If the array is empty, return 0.
|
||||
2. Initialize a slow pointer at the beginning of the array.
|
||||
3. Use a fast pointer to traverse through the array.
|
||||
4. Whenever the element at the fast pointer is different from the element at the slow pointer, increment the slow pointer and update the element at the slow pointer with the element at the fast pointer.
|
||||
5. Continue this process until the fast pointer reaches the end of the array.
|
||||
6. The slow pointer will indicate the position of the last unique element.
|
||||
|
||||
### Example Code
|
||||
|
||||
```python
|
||||
def remove_duplicates(arr):
|
||||
if not arr:
|
||||
return 0
|
||||
|
||||
slow = 0
|
||||
|
||||
for fast in range(1, len(arr)):
|
||||
if arr[fast] != arr[slow]:
|
||||
slow += 1
|
||||
arr[slow] = arr[fast]
|
||||
|
||||
return slow + 1
|
||||
|
||||
# Example usage
|
||||
arr = [1, 1, 2, 2, 3, 4, 4, 5]
|
||||
new_length = remove_duplicates(arr)
|
||||
print("Array after removing duplicates:", arr[:new_length])
|
||||
print("New length of array:", new_length)
|
||||
```
|
||||
# Advantages of the Two-Pointer Technique
|
||||
|
||||
Here are some key benefits of using the two-pointer technique:
|
||||
|
||||
## 1. **Improved Time Complexity**
|
||||
|
||||
It often reduces the time complexity from O(n^2) to O(n), making it significantly faster for many problems.
|
||||
|
||||
### Example
|
||||
- **Finding pairs with a given sum**: Efficiently finds pairs in O(n) time.
|
||||
|
||||
## 2. **Simplicity**
|
||||
|
||||
The implementation is straightforward, using basic operations like incrementing or decrementing pointers.
|
||||
|
||||
### Example
|
||||
- **Removing duplicates from a sorted array**: Easy to implement and understand.
|
||||
|
||||
## 3. **In-Place Solutions**
|
||||
|
||||
Many problems can be solved in place, requiring no extra space beyond the input data.
|
||||
|
||||
### Example
|
||||
- **Reversing a linked list**: Adjusts pointers within the existing nodes.
|
||||
|
||||
## 4. **Versatility**
|
||||
|
||||
Applicable to a wide range of problems, from arrays and strings to linked lists.
|
||||
|
||||
### Example
|
||||
- **Merging two sorted arrays**: Efficiently merges using two pointers.
|
||||
|
||||
## 5. **Efficiency**
|
||||
|
||||
Minimizes redundant operations and enhances performance, especially with large data sets.
|
||||
|
||||
### Example
|
||||
- **Partitioning problems**: Efficiently partitions elements with minimal operations.
|
||||
|
Po Szerokość: | Wysokość: | Rozmiar: 541 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 12 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 13 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 6.0 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 56 KiB |
|
@ -0,0 +1,140 @@
|
|||
# Ensemble Learning
|
||||
|
||||
Ensemble Learning is a powerful machine learning paradigm that combines multiple models to achieve better performance than any individual model. The idea is to leverage the strengths of different models to improve overall accuracy, robustness, and generalization.
|
||||
|
||||
|
||||
|
||||
## Introduction
|
||||
|
||||
Ensemble Learning is a technique that combines the predictions from multiple machine learning models to make more accurate and robust predictions than a single model. It leverages the diversity of different models to reduce errors and improve performance.
|
||||
|
||||
## Types of Ensemble Learning
|
||||
|
||||
### Bagging
|
||||
|
||||
Bagging, or Bootstrap Aggregating, involves training multiple versions of the same model on different subsets of the training data and averaging their predictions. The most common example of bagging is the `RandomForest` algorithm.
|
||||
|
||||
### Boosting
|
||||
|
||||
Boosting focuses on training models sequentially, where each new model corrects the errors made by the previous ones. This way, the ensemble learns from its mistakes, leading to improved performance. `AdaBoost` and `Gradient Boosting` are popular examples of boosting algorithms.
|
||||
|
||||
### Stacking
|
||||
|
||||
Stacking involves training multiple models (the base learners) and a meta-model that combines their predictions. The base learners are trained on the original dataset, while the meta-model is trained on the outputs of the base learners. This approach allows leveraging the strengths of different models.
|
||||
|
||||
## Advantages and Disadvantages
|
||||
|
||||
### Advantages
|
||||
|
||||
- **Improved Accuracy**: Combines the strengths of multiple models.
|
||||
- **Robustness**: Reduces the risk of overfitting and model bias.
|
||||
- **Versatility**: Can be applied to various machine learning tasks, including classification and regression.
|
||||
|
||||
### Disadvantages
|
||||
|
||||
- **Complexity**: More complex than individual models, making interpretation harder.
|
||||
- **Computational Cost**: Requires more computational resources and training time.
|
||||
- **Implementation**: Can be challenging to implement and tune effectively.
|
||||
|
||||
## Key Concepts
|
||||
|
||||
- **Diversity**: The models in the ensemble should be diverse to benefit from their different strengths.
|
||||
- **Voting/Averaging**: For classification, majority voting is used to combine predictions. For regression, averaging is used.
|
||||
- **Weighting**: In some ensembles, models are weighted based on their accuracy or other metrics.
|
||||
|
||||
## Code Examples
|
||||
|
||||
### Bagging with Random Forest
|
||||
|
||||
Below is an example of using Random Forest for classification on the Iris dataset.
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
from sklearn.datasets import load_iris
|
||||
from sklearn.ensemble import RandomForestClassifier
|
||||
from sklearn.model_selection import train_test_split
|
||||
from sklearn.metrics import accuracy_score, classification_report
|
||||
|
||||
# Load dataset
|
||||
iris = load_iris()
|
||||
X, y = iris.data, iris.target
|
||||
|
||||
# Split dataset
|
||||
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
|
||||
|
||||
# Initialize Random Forest model
|
||||
clf = RandomForestClassifier(n_estimators=100, random_state=42)
|
||||
|
||||
# Train the model
|
||||
clf.fit(X_train, y_train)
|
||||
|
||||
# Make predictions
|
||||
y_pred = clf.predict(X_test)
|
||||
|
||||
# Evaluate the model
|
||||
accuracy = accuracy_score(y_test, y_pred)
|
||||
print(f"Accuracy: {accuracy * 100:.2f}%")
|
||||
print("Classification Report:\n", classification_report(y_test, y_pred))
|
||||
```
|
||||
|
||||
### Boosting with AdaBoost
|
||||
Below is an example of using AdaBoost for classification on the Iris dataset.
|
||||
|
||||
```
|
||||
from sklearn.ensemble import AdaBoostClassifier
|
||||
from sklearn.tree import DecisionTreeClassifier
|
||||
|
||||
# Initialize base model
|
||||
base_model = DecisionTreeClassifier(max_depth=1)
|
||||
|
||||
# Initialize AdaBoost model
|
||||
ada_clf = AdaBoostClassifier(base_estimator=base_model, n_estimators=50, random_state=42)
|
||||
|
||||
# Train the model
|
||||
ada_clf.fit(X_train, y_train)
|
||||
|
||||
# Make predictions
|
||||
y_pred = ada_clf.predict(X_test)
|
||||
|
||||
# Evaluate the model
|
||||
accuracy = accuracy_score(y_test, y_pred)
|
||||
print(f"Accuracy: {accuracy * 100:.2f}%")
|
||||
print("Classification Report:\n", classification_report(y_test, y_pred))
|
||||
```
|
||||
|
||||
### Stacking with Multiple Models
|
||||
Below is an example of using stacking with multiple models for classification on the Iris dataset.
|
||||
|
||||
```
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
from sklearn.neighbors import KNeighborsClassifier
|
||||
from sklearn.svm import SVC
|
||||
from sklearn.ensemble import StackingClassifier
|
||||
|
||||
# Define base models
|
||||
base_models = [
|
||||
('knn', KNeighborsClassifier(n_neighbors=5)),
|
||||
('svc', SVC(kernel='linear', probability=True))
|
||||
]
|
||||
|
||||
# Define meta-model
|
||||
meta_model = LogisticRegression()
|
||||
|
||||
# Initialize Stacking model
|
||||
stacking_clf = StackingClassifier(estimators=base_models, final_estimator=meta_model, cv=5)
|
||||
|
||||
# Train the model
|
||||
stacking_clf.fit(X_train, y_train)
|
||||
|
||||
# Make predictions
|
||||
y_pred = stacking_clf.predict(X_test)
|
||||
|
||||
# Evaluate the model
|
||||
accuracy = accuracy_score(y_test, y_pred)
|
||||
print(f"Accuracy: {accuracy * 100:.2f}%")
|
||||
print("Classification Report:\n", classification_report(y_test, y_pred))
|
||||
```
|
||||
|
||||
## Conclusion
|
||||
Ensemble Learning is a powerful technique that combines multiple models to improve overall performance. By leveraging the strengths of different models, it provides better accuracy, robustness, and generalization. However, it comes with increased complexity and computational cost. Understanding and implementing ensemble methods can significantly enhance machine learning solutions.
|
|
@ -0,0 +1,99 @@
|
|||
# Hierarchical Clustering
|
||||
|
||||
Hierarchical Clustering is a method of cluster analysis that seeks to build a hierarchy of clusters. This README provides an overview of the hierarchical clustering algorithm, including its fundamental concepts, types, steps, and how to implement it using Python.
|
||||
|
||||
## Introduction
|
||||
|
||||
Hierarchical Clustering is an unsupervised learning method used to group similar objects into clusters. Unlike other clustering techniques, hierarchical clustering does not require the number of clusters to be specified beforehand. It produces a tree-like structure called a dendrogram, which displays the arrangement of the clusters and their sub-clusters.
|
||||
|
||||
## Concepts
|
||||
|
||||
### Dendrogram
|
||||
|
||||
A dendrogram is a tree-like diagram that records the sequences of merges or splits. It is a useful tool for visualizing the process of hierarchical clustering.
|
||||
|
||||
### Distance Measure
|
||||
|
||||
Distance measures are used to quantify the similarity or dissimilarity between data points. Common distance measures include Euclidean distance, Manhattan distance, and cosine similarity.
|
||||
|
||||
### Linkage Criteria
|
||||
|
||||
Linkage criteria determine how the distance between clusters is calculated. Different linkage criteria include single linkage, complete linkage, average linkage, and Ward's linkage.
|
||||
|
||||
## Types of Hierarchical Clustering
|
||||
|
||||
1. **Agglomerative Clustering (Bottom-Up Approach)**:
|
||||
- Starts with each data point as a separate cluster.
|
||||
- Repeatedly merges the closest pairs of clusters until only one cluster remains or a stopping criterion is met.
|
||||
|
||||
2. **Divisive Clustering (Top-Down Approach)**:
|
||||
- Starts with all data points in a single cluster.
|
||||
- Repeatedly splits clusters into smaller clusters until each data point is its own cluster or a stopping criterion is met.
|
||||
|
||||
## Steps in Hierarchical Clustering
|
||||
|
||||
1. **Calculate Distance Matrix**: Compute the distance between each pair of data points.
|
||||
2. **Create Clusters**: Treat each data point as a single cluster.
|
||||
3. **Merge Closest Clusters**: Find the two clusters that are closest to each other and merge them into a single cluster.
|
||||
4. **Update Distance Matrix**: Update the distance matrix to reflect the distance between the new cluster and the remaining clusters.
|
||||
5. **Repeat**: Repeat steps 3 and 4 until all data points are merged into a single cluster or the desired number of clusters is achieved.
|
||||
|
||||
## Linkage Criteria
|
||||
|
||||
1. **Single Linkage (Minimum Linkage)**: The distance between two clusters is defined as the minimum distance between any single data point in the first cluster and any single data point in the second cluster.
|
||||
2. **Complete Linkage (Maximum Linkage)**: The distance between two clusters is defined as the maximum distance between any single data point in the first cluster and any single data point in the second cluster.
|
||||
3. **Average Linkage**: The distance between two clusters is defined as the average distance between all pairs of data points, one from each cluster.
|
||||
4. **Ward's Linkage**: The distance between two clusters is defined as the increase in the sum of squared deviations from the mean when the two clusters are merged.
|
||||
|
||||
## Implementation
|
||||
|
||||
### Using Scikit-learn
|
||||
|
||||
Scikit-learn is a popular machine learning library in Python that provides tools for hierarchical clustering.
|
||||
|
||||
### Code Example
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
import matplotlib.pyplot as plt
|
||||
from scipy.cluster.hierarchy import dendrogram, linkage
|
||||
from sklearn.cluster import AgglomerativeClustering
|
||||
from sklearn.preprocessing import StandardScaler
|
||||
|
||||
# Load dataset
|
||||
data = pd.read_csv('path/to/your/dataset.csv')
|
||||
|
||||
# Preprocess the data
|
||||
scaler = StandardScaler()
|
||||
data_scaled = scaler.fit_transform(data)
|
||||
|
||||
# Perform hierarchical clustering
|
||||
Z = linkage(data_scaled, method='ward')
|
||||
|
||||
# Plot the dendrogram
|
||||
plt.figure(figsize=(10, 7))
|
||||
dendrogram(Z)
|
||||
plt.title('Dendrogram')
|
||||
plt.xlabel('Data Points')
|
||||
plt.ylabel('Distance')
|
||||
plt.show()
|
||||
|
||||
# Perform Agglomerative Clustering
|
||||
agg_clustering = AgglomerativeClustering(n_clusters=3, affinity='euclidean', linkage='ward')
|
||||
labels = agg_clustering.fit_predict(data_scaled)
|
||||
|
||||
# Add cluster labels to the original data
|
||||
data['Cluster'] = labels
|
||||
print(data.head())
|
||||
```
|
||||
|
||||
## Evaluation Metrics
|
||||
|
||||
- **Silhouette Score**: Measures how similar a data point is to its own cluster compared to other clusters.
|
||||
- **Cophenetic Correlation Coefficient**: Measures how faithfully a dendrogram preserves the pairwise distances between the original data points.
|
||||
- **Dunn Index**: Ratio of the minimum inter-cluster distance to the maximum intra-cluster distance.
|
||||
|
||||
## Conclusion
|
||||
|
||||
Hierarchical clustering is a versatile and intuitive method for clustering data. It is particularly useful when the number of clusters is not known beforehand. By understanding the different linkage criteria and evaluation metrics, one can effectively apply hierarchical clustering to various types of data.
|
|
@ -3,6 +3,7 @@
|
|||
- [Introduction to scikit-learn](sklearn-introduction.md)
|
||||
- [Binomial Distribution](binomial-distribution.md)
|
||||
- [Regression in Machine Learning](regression.md)
|
||||
- [Polynomial Regression](polynomial-regression.md)
|
||||
- [Confusion Matrix](confusion-matrix.md)
|
||||
- [Decision Tree Learning](decision-tree.md)
|
||||
- [Random Forest](random-forest.md)
|
||||
|
@ -11,8 +12,15 @@
|
|||
- [Introduction To Convolutional Neural Networks (CNNs)](intro-to-cnn.md)
|
||||
- [TensorFlow.md](tensorflow.md)
|
||||
- [PyTorch.md](pytorch.md)
|
||||
- [Ensemble Learning](ensemble-learning.md)
|
||||
- [Types of optimizers](types-of-optimizers.md)
|
||||
- [Logistic Regression](logistic-regression.md)
|
||||
- [Types_of_Cost_Functions](cost-functions.md)
|
||||
- [Clustering](clustering.md)
|
||||
- [Hierarchical Clustering](hierarchical-clustering.md)
|
||||
- [Grid Search](grid-search.md)
|
||||
- [Transformers](transformers.md)
|
||||
- [K-Means](kmeans.md)
|
||||
- [K-nearest neighbor (KNN)](knn.md)
|
||||
- [Naive Bayes](naive-bayes.md)
|
||||
- [Neural network regression](neural-network-regression.md)
|
||||
|
|
|
@ -0,0 +1,92 @@
|
|||
# K-Means Clustering
|
||||
Unsupervised Learning Algorithm for Grouping Similar Data.
|
||||
|
||||
## Introduction
|
||||
K-means clustering is a fundamental unsupervised machine learning algorithm that excels at grouping similar data points together. It's a popular choice due to its simplicity and efficiency in uncovering hidden patterns within unlabeled datasets.
|
||||
|
||||
## Unsupervised Learning
|
||||
Unlike supervised learning algorithms that rely on labeled data for training, unsupervised algorithms, like K-means, operate solely on input data (without predefined categories). Their objective is to discover inherent structures or groupings within the data.
|
||||
|
||||
## The K-Means Objective
|
||||
Organize similar data points into clusters to unveil underlying patterns. The main objective is to minimize total intra-cluster variance or the squared function.
|
||||
|
||||

|
||||
## Clusters and Centroids
|
||||
A cluster represents a collection of data points that share similar characteristics. K-means identifies a pre-determined number (k) of clusters within the dataset. Each cluster is represented by a centroid, which acts as its central point (imaginary or real).
|
||||
|
||||
## Minimizing In-Cluster Variation
|
||||
The K-means algorithm strategically assigns each data point to a cluster such that the total variation within each cluster (measured by the sum of squared distances between points and their centroid) is minimized. In simpler terms, K-means strives to create clusters where data points are close to their respective centroids.
|
||||
|
||||
## The Meaning Behind "K-Means"
|
||||
The "means" in K-means refers to the averaging process used to compute the centroid, essentially finding the center of each cluster.
|
||||
|
||||
## K-Means Algorithm in Action
|
||||

|
||||
The K-means algorithm follows an iterative approach to optimize cluster formation:
|
||||
|
||||
1. **Initial Centroid Placement:** The process begins with randomly selecting k centroids to serve as initial reference points for each cluster.
|
||||
2. **Data Point Assignment:** Each data point is assigned to the closest centroid, effectively creating a preliminary clustering.
|
||||
3. **Centroid Repositioning:** Once data points are assigned, the centroids are recalculated by averaging the positions of the points within their respective clusters. These new centroids represent the refined centers of the clusters.
|
||||
4. **Iteration Until Convergence:** Steps 2 and 3 are repeated iteratively until a stopping criterion is met. This criterion can be either:
|
||||
- **Centroid Stability:** No significant change occurs in the centroids' positions, indicating successful clustering.
|
||||
- **Reaching Maximum Iterations:** A predefined number of iterations is completed.
|
||||
|
||||
## Code
|
||||
Following is a simple implementation of K-Means.
|
||||
|
||||
```python
|
||||
# Generate and Visualize Sample Data
|
||||
# import the necessary Libraries
|
||||
|
||||
import numpy as np
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
# Create data points for cluster 1 and cluster 2
|
||||
X = -2 * np.random.rand(100, 2)
|
||||
X1 = 1 + 2 * np.random.rand(50, 2)
|
||||
|
||||
# Combine data points from both clusters
|
||||
X[50:100, :] = X1
|
||||
|
||||
# Plot data points and display the plot
|
||||
plt.scatter(X[:, 0], X[:, 1], s=50, c='b')
|
||||
plt.show()
|
||||
|
||||
# K-Means Model Creation and Training
|
||||
from sklearn.cluster import KMeans
|
||||
|
||||
# Create KMeans object with 2 clusters
|
||||
kmeans = KMeans(n_clusters=2)
|
||||
kmeans.fit(X) # Train the model on the data
|
||||
|
||||
# Visualize Data Points with Centroids
|
||||
centroids = kmeans.cluster_centers_ # Get centroids (cluster centers)
|
||||
|
||||
plt.scatter(X[:, 0], X[:, 1], s=50, c='b') # Plot data points again
|
||||
plt.scatter(centroids[0, 0], centroids[0, 1], s=200, c='g', marker='s') # Plot centroid 1
|
||||
plt.scatter(centroids[1, 0], centroids[1, 1], s=200, c='r', marker='s') # Plot centroid 2
|
||||
plt.show() # Display the plot with centroids
|
||||
|
||||
# Predict Cluster Label for New Data Point
|
||||
new_data = np.array([-3.0, -3.0])
|
||||
new_data_reshaped = new_data.reshape(1, -1)
|
||||
predicted_cluster = kmeans.predict(new_data_reshaped)
|
||||
print("Predicted cluster for new data:", predicted_cluster)
|
||||
```
|
||||
|
||||
### Output:
|
||||
Before Implementing K-Means Clustering
|
||||

|
||||
|
||||
After Implementing K-Means Clustering
|
||||

|
||||
|
||||
Predicted cluster for new data: `[0]`
|
||||
|
||||
## Conclusion
|
||||
**K-Means** can be applied to data that has a smaller number of dimensions, is numeric, and is continuous or can be used to find groups that have not been explicitly labeled in the data. As an example, it can be used for Document Classification, Delivery Store Optimization, or Customer Segmentation.
|
||||
|
||||
## References
|
||||
|
||||
- [Survey of Machine Learning and Data Mining Techniques used in Multimedia System](https://www.researchgate.net/publication/333457161_Survey_of_Machine_Learning_and_Data_Mining_Techniques_used_in_Multimedia_System?_tp=eyJjb250ZXh0Ijp7ImZpcnN0UGFnZSI6Il9kaXJlY3QiLCJwYWdlIjoiX2RpcmVjdCJ9fQ)
|
||||
- [A Clustering Approach for Outliers Detection in a Big Point-of-Sales Database](https://www.researchgate.net/publication/339267868_A_Clustering_Approach_for_Outliers_Detection_in_a_Big_Point-of-Sales_Database?_tp=eyJjb250ZXh0Ijp7ImZpcnN0UGFnZSI6Il9kaXJlY3QiLCJwYWdlIjoiX2RpcmVjdCJ9fQ)
|
|
@ -0,0 +1,122 @@
|
|||
# K-Nearest Neighbors (KNN) Machine Learning Algorithm in Python
|
||||
|
||||
## Introduction
|
||||
K-Nearest Neighbors (KNN) is a simple, yet powerful, supervised machine learning algorithm used for both classification and regression tasks. It assumes that similar things exist in close proximity. In other words, similar data points are near to each other.
|
||||
|
||||
## How KNN Works
|
||||
KNN works by finding the distances between a query and all the examples in the data, selecting the specified number of examples (K) closest to the query, then voting for the most frequent label (in classification) or averaging the labels (in regression).
|
||||
|
||||
### Steps:
|
||||
1. **Choose the number K of neighbors**
|
||||
2. **Calculate the distance** between the query-instance and all the training samples
|
||||
3. **Sort the distances** and determine the nearest neighbors based on the K-th minimum distance
|
||||
4. **Gather the labels** of the nearest neighbors
|
||||
5. **Vote for the most frequent label** (in case of classification) or **average the labels** (in case of regression)
|
||||
|
||||
## When to Use KNN
|
||||
### Advantages:
|
||||
- **Simple and easy to understand:** KNN is intuitive and easy to implement.
|
||||
- **No training phase:** KNN is a lazy learner, meaning there is no explicit training phase.
|
||||
- **Effective with a small dataset:** KNN performs well with a small number of input variables.
|
||||
|
||||
### Disadvantages:
|
||||
- **Computationally expensive:** The algorithm becomes significantly slower as the number of examples and/or predictors/independent variables increase.
|
||||
- **Sensitive to irrelevant features:** All features contribute to the distance equally.
|
||||
- **Memory-intensive:** Storing all the training data can be costly.
|
||||
|
||||
### Use Cases:
|
||||
- **Recommender Systems:** Suggest items based on similarity to user preferences.
|
||||
- **Image Recognition:** Classify images by comparing new images to the training set.
|
||||
- **Finance:** Predict credit risk or fraud detection based on historical data.
|
||||
|
||||
## KNN in Python
|
||||
|
||||
### Required Libraries
|
||||
To implement KNN, we need the following Python libraries:
|
||||
- `numpy`
|
||||
- `pandas`
|
||||
- `scikit-learn`
|
||||
- `matplotlib` (for visualization)
|
||||
|
||||
### Installation
|
||||
```bash
|
||||
pip install numpy pandas scikit-learn matplotlib
|
||||
```
|
||||
|
||||
### Example Code
|
||||
Let's implement a simple KNN classifier using the Iris dataset.
|
||||
|
||||
#### Step 1: Import Libraries
|
||||
```python
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
from sklearn.model_selection import train_test_split
|
||||
from sklearn.neighbors import KNeighborsClassifier
|
||||
from sklearn.metrics import accuracy_score
|
||||
import matplotlib.pyplot as plt
|
||||
```
|
||||
|
||||
#### Step 2: Load Dataset
|
||||
```python
|
||||
from sklearn.datasets import load_iris
|
||||
iris = load_iris()
|
||||
X = iris.data
|
||||
y = iris.target
|
||||
```
|
||||
|
||||
#### Step 3: Split Dataset
|
||||
```python
|
||||
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
|
||||
```
|
||||
|
||||
#### Step 4: Train KNN Model
|
||||
```python
|
||||
knn = KNeighborsClassifier(n_neighbors=3)
|
||||
knn.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
#### Step 5: Make Predictions
|
||||
```python
|
||||
y_pred = knn.predict(X_test)
|
||||
```
|
||||
|
||||
#### Step 6: Evaluate the Model
|
||||
```python
|
||||
accuracy = accuracy_score(y_test, y_pred)
|
||||
print(f'Accuracy: {accuracy}')
|
||||
```
|
||||
|
||||
### Visualization (Optional)
|
||||
```python
|
||||
# Plotting the decision boundary for visualization (for 2D data)
|
||||
h = .02 # step size in the mesh
|
||||
# Create color maps
|
||||
cmap_light = plt.cm.RdYlBu
|
||||
cmap_bold = plt.cm.RdYlBu
|
||||
|
||||
# For simplicity, we take only the first two features of the dataset
|
||||
X_plot = X[:, :2]
|
||||
x_min, x_max = X_plot[:, 0].min() - 1, X_plot[:, 0].max() + 1
|
||||
y_min, y_max = X_plot[:, 1].min() - 1, y_plot[:, 1].max() + 1
|
||||
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
|
||||
np.arange(y_min, y_max, h))
|
||||
|
||||
Z = knn.predict(np.c_[xx.ravel(), yy.ravel()])
|
||||
Z = Z.reshape(xx.shape)
|
||||
plt.figure()
|
||||
plt.pcolormesh(xx, yy, Z, cmap=cmap_light)
|
||||
|
||||
# Plot also the training points
|
||||
plt.scatter(X_plot[:, 0], X_plot[:, 1], c=y, edgecolor='k', cmap=cmap_bold)
|
||||
plt.xlim(xx.min(), xx.max())
|
||||
plt.ylim(yy.min(), yy.max())
|
||||
plt.title("3-Class classification (k = 3)")
|
||||
plt.show()
|
||||
```
|
||||
|
||||
## Generalization and Considerations
|
||||
- **Choosing K:** The choice of K is critical. Smaller values of K can lead to noisy models, while larger values make the algorithm computationally expensive and might oversimplify the model.
|
||||
- **Feature Scaling:** Since KNN relies on distance calculations, features should be scaled (standardized or normalized) to ensure that all features contribute equally to the distance computation.
|
||||
- **Distance Metrics:** The choice of distance metric (Euclidean, Manhattan, etc.) can affect the performance of the algorithm.
|
||||
|
||||
In conclusion, KNN is a versatile and easy-to-implement algorithm suitable for various classification and regression tasks, particularly when working with small datasets and well-defined features. However, careful consideration should be given to the choice of K, feature scaling, and distance metrics to optimize its performance.
|
|
@ -0,0 +1,369 @@
|
|||
# Naive Bayes
|
||||
|
||||
## Introduction
|
||||
|
||||
The Naive Bayes model uses probabilities to predict an outcome.It is a supervised machine learning technique, i.e. it reqires labelled data for training. It is used for classification and is based on the Bayes' Theorem. The basic assumption of this model is the independence among the features, i.e. a feature is unaffected by any other feture.
|
||||
|
||||
## Bayes' Theorem
|
||||
|
||||
Bayes' theorem is given by:
|
||||
|
||||
$$
|
||||
P(a|b) = \frac{P(b|a)*P(a)}{P(b)}
|
||||
$$
|
||||
|
||||
where:
|
||||
- $P(a|b)$ is the posterior probability, i.e. probability of 'a' given that 'b' is true,
|
||||
- $P(b|a)$ is the likelihood probability i.e. probability of 'b' given that 'a' is true,
|
||||
- $P(a)$ and $P(b)$ are the probabilities of 'a' and 'b' respectively, independent of each other.
|
||||
|
||||
|
||||
## Applications
|
||||
|
||||
Naive Bayes classifier has numerous applications including :
|
||||
1. Text classification.
|
||||
2. Sentiment analysis.
|
||||
3. Spam filtering.
|
||||
4. Multiclass classification (eg. Weather prediction).
|
||||
5. Recommendation Systems.
|
||||
6. Healthcare sector.
|
||||
7. Document categorization.
|
||||
|
||||
|
||||
## Advantages
|
||||
|
||||
1. Easy to implement.
|
||||
2. Useful even if training dataset is limited (where a decision tree would not be recommended).
|
||||
3. Supports multiclass classification which is not supported by some machine learning algorithms like SVM and logistic regression.
|
||||
4. Scalable, fast and efficient.
|
||||
|
||||
## Disadvantages
|
||||
|
||||
1. Assumes features to be independent, which may not be true in certain scenarios.
|
||||
2. Zero probability error.
|
||||
3. Sensitive to noise.
|
||||
|
||||
## Zero Probability Error
|
||||
|
||||
Zero probability error is said to occur if in some case the number of occurances of an event given another event is zero.
|
||||
To handle zero probability error, Laplace's correction is used by adding a small constant .
|
||||
|
||||
**Example:**
|
||||
|
||||
|
||||
Given the data below, find whether tennis can be played if ( outlook=overcast, wind=weak ).
|
||||
|
||||
**Data**
|
||||
|
||||
---
|
||||
| SNo | Outlook (A) | Wind (B) | PlayTennis (R) |
|
||||
|-----|--------------|------------|-------------------|
|
||||
| 1 | Rain | Weak | No |
|
||||
| 2 | Rain | Strong | No |
|
||||
| 3 | Overcast | Weak | Yes |
|
||||
| 4 | Rain | Weak | Yes |
|
||||
| 5 | Overcast | Weak | Yes |
|
||||
| 6 | Rain | Strong | No |
|
||||
| 7 | Overcast | Strong | Yes |
|
||||
| 8 | Rain | Weak | No |
|
||||
| 9 | Overcast | Weak | Yes |
|
||||
| 10 | Rain | Weak | Yes |
|
||||
---
|
||||
|
||||
- **Calculate prior probabilities**
|
||||
|
||||
$$
|
||||
P(Yes) = \frac{6}{10} = 0.6
|
||||
$$
|
||||
$$
|
||||
P(No) = \frac{4}{10} = 0.4
|
||||
$$
|
||||
|
||||
- **Calculate likelihoods**
|
||||
|
||||
1.**Outlook (A):**
|
||||
|
||||
---
|
||||
| A\R | Yes | No |
|
||||
|-----------|-------|-----|
|
||||
| Rain | 2 | 4 |
|
||||
| Overcast | 4 | 0 |
|
||||
| Total | 6 | 4 |
|
||||
---
|
||||
|
||||
- Rain:
|
||||
|
||||
$$
|
||||
P(Rain|Yes) = \frac{2}{6}
|
||||
$$
|
||||
|
||||
$$
|
||||
P(Rain|No) = \frac{4}{4}
|
||||
$$
|
||||
|
||||
- Overcast:
|
||||
|
||||
$$
|
||||
P(Overcast|Yes) = \frac{4}{6}
|
||||
$$
|
||||
$$
|
||||
P(Overcast|No) = \frac{0}{4}
|
||||
$$
|
||||
|
||||
|
||||
Here, we can see that
|
||||
$$
|
||||
P(Overcast|No) = 0
|
||||
$$
|
||||
This is a zero probability error!
|
||||
|
||||
Since probability is 0, naive bayes model fails to predict.
|
||||
|
||||
**Applying Laplace's correction:**
|
||||
|
||||
In Laplace's correction, we scale the values for 1000 instances.
|
||||
- **Calculate prior probabilities**
|
||||
|
||||
$$
|
||||
P(Yes) = \frac{600}{1002}
|
||||
$$
|
||||
|
||||
$$
|
||||
P(No) = \frac{402}{1002}
|
||||
$$
|
||||
|
||||
- **Calculate likelihoods**
|
||||
|
||||
1. **Outlook (A):**
|
||||
|
||||
|
||||
( Converted to 1000 instances )
|
||||
|
||||
We will add 1 instance each to the (PlayTennis|No) column {Laplace's correction}
|
||||
|
||||
---
|
||||
| A\R | Yes | No |
|
||||
|-----------|-------|---------------|
|
||||
| Rain | 200 | (400+1)=401 |
|
||||
| Overcast | 400 | (0+1)=1 |
|
||||
| Total | 600 | 402 |
|
||||
---
|
||||
|
||||
- **Rain:**
|
||||
|
||||
$$
|
||||
P(Rain|Yes) = \frac{200}{600}
|
||||
$$
|
||||
$$
|
||||
P(Rain|No) = \frac{401}{402}
|
||||
$$
|
||||
|
||||
- **Overcast:**
|
||||
|
||||
$$
|
||||
P(Overcast|Yes) = \frac{400}{600}
|
||||
$$
|
||||
$$
|
||||
P(Overcast|No) = \frac{1}{402}
|
||||
$$
|
||||
|
||||
|
||||
2. **Wind (B):**
|
||||
|
||||
|
||||
---
|
||||
| B\R | Yes | No |
|
||||
|-----------|---------|-------|
|
||||
| Weak | 500 | 200 |
|
||||
| Strong | 100 | 200 |
|
||||
| Total | 600 | 400 |
|
||||
---
|
||||
|
||||
- **Weak:**
|
||||
|
||||
$$
|
||||
P(Weak|Yes) = \frac{500}{600}
|
||||
$$
|
||||
$$
|
||||
P(Weak|No) = \frac{200}{400}
|
||||
$$
|
||||
|
||||
- **Strong:**
|
||||
|
||||
$$
|
||||
P(Strong|Yes) = \frac{100}{600}
|
||||
$$
|
||||
$$
|
||||
P(Strong|No) = \frac{200}{400}
|
||||
$$
|
||||
|
||||
- **Calculting probabilities:**
|
||||
|
||||
$$
|
||||
P(PlayTennis|Yes) = P(Yes) * P(Overcast|Yes) * P(Weak|Yes)
|
||||
$$
|
||||
$$
|
||||
= \frac{600}{1002} * \frac{400}{600} * \frac{500}{600}
|
||||
$$
|
||||
$$
|
||||
= 0.3326
|
||||
$$
|
||||
|
||||
$$
|
||||
P(PlayTennis|No) = P(No) * P(Overcast|No) * P(Weak|No)
|
||||
$$
|
||||
$$
|
||||
= \frac{402}{1002} * \frac{1}{402} * \frac{200}{400}
|
||||
$$
|
||||
$$
|
||||
= 0.000499 = 0.0005
|
||||
$$
|
||||
|
||||
|
||||
Since ,
|
||||
$$
|
||||
P(PlayTennis|Yes) > P(PlayTennis|No)
|
||||
$$
|
||||
we can conclude that tennis can be played if outlook is overcast and wind is weak.
|
||||
|
||||
|
||||
# Types of Naive Bayes classifier
|
||||
|
||||
|
||||
## Guassian Naive Bayes
|
||||
|
||||
It is used when the dataset has **continuous data**. It assumes that the data is distributed normally (also known as guassian distribution).
|
||||
A guassian distribution can be characterized by a bell-shaped curve.
|
||||
|
||||
**Continuous data features :** Features which can take any real values within a certain range. These features have an infinite number of possible values.They are generally measured, not counted.
|
||||
eg. weight, height, temperature, etc.
|
||||
|
||||
**Code**
|
||||
|
||||
```python
|
||||
|
||||
#import libraries
|
||||
import pandas as pd
|
||||
from sklearn.model_selection import train_test_split
|
||||
from sklearn.naive_bayes import GaussianNB
|
||||
from sklearn import metrics
|
||||
from sklearn.metrics import confusion_matrix
|
||||
|
||||
#read data
|
||||
d=pd.read_csv("data.csv")
|
||||
df=pd.DataFrame(d)
|
||||
|
||||
X = df.iloc[:,1:7:1]
|
||||
y = df.iloc[:,7:8:1]
|
||||
|
||||
# splitting X and y into training and testing sets
|
||||
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)
|
||||
|
||||
|
||||
# training the model on training set
|
||||
obj = GaussianNB()
|
||||
obj.fit(X_train, y_train)
|
||||
|
||||
#making predictions on the testing set
|
||||
y_pred = obj.predict(X_train)
|
||||
|
||||
#comparing y_test and y_pred
|
||||
print("Gaussian Naive Bayes model accuracy:", metrics.accuracy_score(y_train, y_pred))
|
||||
print("Confusion matrix: \n",confusion_matrix(y_train,y_pred))
|
||||
|
||||
```
|
||||
|
||||
|
||||
## Multinomial Naive Bayes
|
||||
|
||||
Appropriate when the features are categorical or countable. It models the likelihood of each feature as a multinomial distribution.
|
||||
Multinomial distribution is used to find probabilities of each category, given multiple categories (eg. Text classification).
|
||||
|
||||
**Code**
|
||||
|
||||
```python
|
||||
|
||||
#import libraries
|
||||
import pandas as pd
|
||||
from sklearn.model_selection import train_test_split
|
||||
from sklearn.naive_bayes import MultinomialNB
|
||||
from sklearn import metrics
|
||||
from sklearn.metrics import confusion_matrix
|
||||
|
||||
#read data
|
||||
d=pd.read_csv("data.csv")
|
||||
df=pd.DataFrame(d)
|
||||
|
||||
X = df.iloc[:,1:7:1]
|
||||
y = df.iloc[:,7:8:1]
|
||||
|
||||
# splitting X and y into training and testing sets
|
||||
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)
|
||||
|
||||
|
||||
# training the model on training set
|
||||
obj = MultinomialNB()
|
||||
obj.fit(X_train, y_train)
|
||||
|
||||
#making predictions on the testing set
|
||||
y_pred = obj.predict(X_train)
|
||||
|
||||
#comparing y_test and y_pred
|
||||
print("Gaussian Naive Bayes model accuracy:", metrics.accuracy_score(y_train, y_pred))
|
||||
print("Confusion matrix: \n",confusion_matrix(y_train,y_pred))
|
||||
|
||||
|
||||
```
|
||||
|
||||
## Bernoulli Naive Bayes
|
||||
|
||||
It is specifically designed for binary features (eg. Yes or No). It models the likelihood of each feature as a Bernoulli distribution.
|
||||
Bernoulli distribution is used when there are only two possible outcomes (eg. success or failure of an event).
|
||||
|
||||
**Code**
|
||||
|
||||
```python
|
||||
|
||||
#import libraries
|
||||
import pandas as pd
|
||||
from sklearn.model_selection import train_test_split
|
||||
from sklearn.naive_bayes import BernoulliNB
|
||||
from sklearn import metrics
|
||||
from sklearn.metrics import confusion_matrix
|
||||
|
||||
#read data
|
||||
d=pd.read_csv("data.csv")
|
||||
df=pd.DataFrame(d)
|
||||
|
||||
X = df.iloc[:,1:7:1]
|
||||
y = df.iloc[:,7:8:1]
|
||||
|
||||
# splitting X and y into training and testing sets
|
||||
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)
|
||||
|
||||
|
||||
# training the model on training set
|
||||
obj = BernoulliNB()
|
||||
obj.fit(X_train, y_train)
|
||||
|
||||
#making predictions on the testing set
|
||||
y_pred = obj.predict(X_train)
|
||||
|
||||
#comparing y_test and y_pred
|
||||
print("Gaussian Naive Bayes model accuracy:", metrics.accuracy_score(y_train, y_pred))
|
||||
print("Confusion matrix: \n",confusion_matrix(y_train,y_pred))
|
||||
|
||||
```
|
||||
|
||||
|
||||
## Evaluation
|
||||
|
||||
1. Confusion matrix.
|
||||
2. Accuracy.
|
||||
3. ROC curve.
|
||||
|
||||
|
||||
## Conclusion
|
||||
|
||||
We can conclude that naive bayes may limit in some cases due to the assumption that the features are independent of each other but still reliable in many cases. Naive Bayes is an efficient classifier and works even on small datasets.
|
||||
|
|
@ -0,0 +1,84 @@
|
|||
# Neural Network Regression in Python using Scikit-learn
|
||||
|
||||
## Overview
|
||||
|
||||
Neural Network Regression is used to predict continuous values based on input features. Scikit-learn provides an easy-to-use interface for implementing neural network models, specifically through the `MLPRegressor` class, which stands for Multi-Layer Perceptron Regressor.
|
||||
|
||||
## When to Use Neural Network Regression
|
||||
|
||||
### Suitable Scenarios
|
||||
|
||||
1. **Complex Relationships**: Ideal when the relationship between features and the target variable is complex and non-linear.
|
||||
2. **Sufficient Data**: Works well with large datasets that can support training deep learning models.
|
||||
3. **Feature Extraction**: Useful in cases where the neural network's feature extraction capabilities can be leveraged, such as with image or text data.
|
||||
|
||||
### Unsuitable Scenarios
|
||||
|
||||
1. **Small Datasets**: Less effective with small datasets due to overfitting and inability to learn complex patterns.
|
||||
2. **Low-latency Predictions**: Might not be suitable for real-time applications with strict latency requirements.
|
||||
3. **Interpretability**: Not ideal when model interpretability is crucial, as neural networks are often seen as "black-box" models.
|
||||
|
||||
## Implementing Neural Network Regression in Python with Scikit-learn
|
||||
|
||||
### Step-by-Step Implementation
|
||||
|
||||
1. **Import Libraries**
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
from sklearn.model_selection import train_test_split
|
||||
from sklearn.preprocessing import StandardScaler
|
||||
from sklearn.neural_network import MLPRegressor
|
||||
from sklearn.metrics import mean_absolute_error
|
||||
```
|
||||
|
||||
2. **Load and Prepare Data**
|
||||
|
||||
For illustration, let's use a synthetic dataset.
|
||||
|
||||
```python
|
||||
# Generate synthetic data
|
||||
np.random.seed(42)
|
||||
X = np.random.rand(1000, 3)
|
||||
y = X[:, 0] * 3 + X[:, 1] * -2 + X[:, 2] * 0.5 + np.random.randn(1000) * 0.1
|
||||
|
||||
# Split the data
|
||||
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
|
||||
|
||||
# Standardize the data
|
||||
scaler = StandardScaler()
|
||||
X_train = scaler.fit_transform(X_train)
|
||||
X_test = scaler.transform(X_test)
|
||||
```
|
||||
|
||||
3. **Build and Train the Neural Network Model**
|
||||
|
||||
```python
|
||||
# Create the MLPRegressor model
|
||||
mlp = MLPRegressor(hidden_layer_sizes=(64, 64), activation='relu', solver='adam', max_iter=500, random_state=42)
|
||||
|
||||
# Train the model
|
||||
mlp.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
4. **Evaluate the Model**
|
||||
|
||||
```python
|
||||
# Make predictions
|
||||
y_pred = mlp.predict(X_test)
|
||||
|
||||
# Calculate the Mean Absolute Error
|
||||
mae = mean_absolute_error(y_test, y_pred)
|
||||
print(f"Test Mean Absolute Error: {mae}")
|
||||
```
|
||||
|
||||
### Explanation
|
||||
|
||||
- **Data Generation and Preparation**: Synthetic data is created, split into training and test sets, and standardized to improve the efficiency of the neural network training process.
|
||||
- **Model Construction and Training**: An `MLPRegressor` is created with two hidden layers, each containing 64 neurons and ReLU activation functions. The model is trained using the Adam optimizer for a maximum of 500 iterations.
|
||||
- **Evaluation**: The model's performance is evaluated on the test set using Mean Absolute Error (MAE) as the performance metric.
|
||||
|
||||
## Conclusion
|
||||
|
||||
Neural Network Regression with Scikit-learn's `MLPRegressor` is a powerful method for predicting continuous values in complex, non-linear scenarios. However, it's essential to ensure that you have enough data to train the model effectively and consider the computational resources required. Simpler models may be more appropriate for small datasets or when model interpretability is necessary. By following the steps outlined, you can build, train, and evaluate a neural network for regression tasks in Python using Scikit-learn.
|
|
@ -0,0 +1,102 @@
|
|||
# Polynomial Regression
|
||||
|
||||
Polynomial Regression is a form of regression analysis in which the relationship between the independent variable $x$ and the dependent variable $y$ is modeled as an $nth$ degree polynomial. This guide provides an overview of polynomial regression, including its fundamental concepts, assumptions, and how to implement it using Python.
|
||||
|
||||
## Introduction
|
||||
|
||||
Polynomial Regression is used when the data shows a non-linear relationship between the independent variable $x$ and the dependent variable $y$ is modeled as an $nth$ degree polynomial. It extends the simple linear regression model by considering polynomial terms of the independent variable, allowing for a more flexible fit to the data.
|
||||
|
||||
## Concepts
|
||||
|
||||
### Polynomial Equation
|
||||
|
||||
The polynomial regression model is based on the following polynomial equation:
|
||||
|
||||
$$
|
||||
\[ y = \beta_0 + \beta_1 x + \beta_2 x^2 + \beta_3 x^3 + \cdots + \beta_n x^n + \epsilon \]
|
||||
$$
|
||||
|
||||
Where:
|
||||
- $y$ is the dependent variable.
|
||||
- $x$ is the independent variable.
|
||||
- $\beta_0, \beta_1, \ldots, \beta_n$ are the coefficients of the polynomial.
|
||||
- $\epsilon$ is the error term.
|
||||
|
||||
### Degree of Polynomial
|
||||
|
||||
The degree of the polynomial (n) determines the flexibility of the model. A higher degree allows the model to fit more complex, non-linear relationships, but it also increases the risk of overfitting.
|
||||
|
||||
### Overfitting and Underfitting
|
||||
|
||||
- **Overfitting**: When the model fits the noise in the training data too closely, resulting in poor generalization to new data.
|
||||
- **Underfitting**: When the model is too simple to capture the underlying pattern in the data.
|
||||
|
||||
## Assumptions
|
||||
|
||||
1. **Independence**: Observations are independent of each other.
|
||||
2. **Homoscedasticity**: The variance of the residuals (errors) is constant across all levels of the independent variable.
|
||||
3. **Normality**: The residuals of the model are normally distributed.
|
||||
4. **No Multicollinearity**: The predictor variables are not highly correlated with each other.
|
||||
|
||||
## Implementation
|
||||
|
||||
### Using Scikit-learn
|
||||
|
||||
Scikit-learn is a popular machine learning library in Python that provides tools for polynomial regression.
|
||||
|
||||
### Code Example
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
import matplotlib.pyplot as plt
|
||||
from sklearn.preprocessing import PolynomialFeatures
|
||||
from sklearn.linear_model import LinearRegression
|
||||
from sklearn.metrics import mean_squared_error, r2_score
|
||||
|
||||
# Load dataset
|
||||
data = pd.read_csv('path/to/your/dataset.csv')
|
||||
|
||||
# Define features and target variable
|
||||
X = data[['feature']]
|
||||
y = data['target']
|
||||
|
||||
# Transform features to polynomial features
|
||||
poly = PolynomialFeatures(degree=3)
|
||||
X_poly = poly.fit_transform(X)
|
||||
|
||||
# Initialize and train polynomial regression model
|
||||
model = LinearRegression()
|
||||
model.fit(X_poly, y)
|
||||
|
||||
# Make predictions
|
||||
y_pred = model.predict(X_poly)
|
||||
|
||||
# Evaluate the model
|
||||
mse = mean_squared_error(y, y_pred)
|
||||
r2 = r2_score(y, y_pred)
|
||||
print("Mean Squared Error:", mse)
|
||||
print("R^2 Score:", r2)
|
||||
|
||||
# Visualize the results
|
||||
plt.scatter(X, y, color='blue')
|
||||
plt.plot(X, y_pred, color='red')
|
||||
plt.xlabel('Feature')
|
||||
plt.ylabel('Target')
|
||||
plt.title('Polynomial Regression')
|
||||
plt.show()
|
||||
```
|
||||
|
||||
## Evaluation Metrics
|
||||
|
||||
- **Mean Squared Error (MSE)**: The average of the squared differences between actual and predicted values.
|
||||
- **R-squared (R²) Score**: A statistical measure that represents the proportion of the variance for the dependent variable that is explained by the independent variables in the model.
|
||||
|
||||
## Conclusion
|
||||
|
||||
Polynomial Regression is a powerful tool for modeling non-linear relationships between variables. It is important to choose the degree of the polynomial carefully to balance between underfitting and overfitting. Understanding and properly evaluating the model using appropriate metrics ensures its effectiveness.
|
||||
|
||||
## References
|
||||
|
||||
- [Scikit-learn Documentation](https://scikit-learn.org/stable/modules/linear_model.html#polynomial-regression)
|
||||
- [Wikipedia: Polynomial Regression](https://en.wikipedia.org/wiki/Polynomial_reg)
|
|
@ -0,0 +1,443 @@
|
|||
# Transformers
|
||||
## Introduction
|
||||
A transformer is a deep learning architecture developed by Google and based on the multi-head attention mechanism. It is based on the softmax-based attention
|
||||
mechanism. Before transformers, predecessors of attention mechanism were added to gated recurrent neural networks, such as LSTMs and gated recurrent units (GRUs), which processed datasets sequentially. Dependency on previous token computations prevented them from being able to parallelize the attention mechanism.
|
||||
|
||||
Transformers are a revolutionary approach to natural language processing (NLP). Unlike older models, they excel at understanding long-range connections between words. This "attention" mechanism lets them grasp the context of a sentence, making them powerful for tasks like machine translation, text summarization, and question answering. Introduced in 2017, transformers are now the backbone of many large language models, including tools you might use every day. Their ability to handle complex relationships in language is fueling advancements in AI across various fields.
|
||||
|
||||
## Model Architecture
|
||||
|
||||

|
||||
|
||||
Source: [Attention Is All You Need](https://arxiv.org/pdf/1706.03762)
|
||||
|
||||
|
||||
### Encoder
|
||||
The encoder is composed of a stack of identical layers. Each layer has two sub-layers. The first is a multi-head self-attention mechanism, and the second is a simple, positionwise fully connected feed-forward network. Each encoder consists of two major components: a self-attention mechanism and a feed-forward neural network. The self-attention mechanism accepts input encodings from the previous encoder and weights their relevance to each other to generate output encodings. The feed-forward neural network further processes each output encoding individually. These output encodings are then passed to the next encoder as its input, as well as to the decoders.
|
||||
|
||||
### Decoder
|
||||
The decoder is also composed of a stack of identical layers. In addition to the two sub-layers in each encoder layer, the decoder inserts a third sub-layer, which performs multi-head attention over the output of the encoder stack. The decoder functions in a similar fashion to the encoder, but an additional attention mechanism is inserted which instead draws relevant information from the encodings generated by the encoders. This mechanism can also be called the encoder-decoder attention.
|
||||
|
||||
### Attention
|
||||
#### Scaled Dot-Product Attention
|
||||
The input consists of queries and keys of dimension $d_k$ , and values of dimension $d_v$. We compute the dot products of the query with all keys, divide each by $\sqrt {d_k}$ , and apply a softmax function to obtain the weights on the values.
|
||||
|
||||
$$Attention(Q, K, V) = softmax(\dfrac{QK^T}{\sqrt{d_k}}) \times V$$
|
||||
|
||||
#### Multi-Head Attention
|
||||
Instead of performing a single attention function with $d_{model}$-dimensional keys, values and queries, it is beneficial to linearly project the queries, keys and values h times with different, learned linear projections to $d_k$ , $d_k$ and $d_v$ dimensions, respectively.
|
||||
|
||||
Multi-head attention allows the model to jointly attend to information from different representation
|
||||
subspaces at different positions. With a single attention head, averaging inhibits this.
|
||||
|
||||
$$MultiHead(Q, K, V) = Concat(head_1, _{...}, head_h) \times W^O$$
|
||||
|
||||
where,
|
||||
|
||||
$$head_i = Attention(QW_i^Q, KW_i^K, VW_i^V)$$
|
||||
|
||||
where the projections are parameter matrices.
|
||||
|
||||
#### Masked Attention
|
||||
It may be necessary to cut out attention links between some word-pairs. For example, the decoder for token position
|
||||
$t$ should not have access to token position $t+1$.
|
||||
|
||||
$$MaskedAttention(Q, K, V) = softmax(M + \dfrac{QK^T}{\sqrt{d_k}}) \times V$$
|
||||
|
||||
### Feed-Forward Network
|
||||
Each of the layers in the encoder and decoder contains a fully connected feed-forward network, which is applied to each position separately and identically. This
|
||||
consists of two linear transformations with a ReLU activation in between.
|
||||
|
||||
$$FFN(x) = max(0, xW_1 + b_1)W_2 + b_2$$
|
||||
|
||||
### Positional Encoding
|
||||
A positional encoding is a fixed-size vector representation that encapsulates the relative positions of tokens within a target sequence: it provides the transformer model with information about where the words are in the input sequence.
|
||||
|
||||
The sine and cosine functions of different frequencies:
|
||||
|
||||
$$PE<sub>(pos,2i)</sub> = \sin({\dfrac{pos}{10000^{\dfrac{2i}{d_{model}}}}})$$
|
||||
|
||||
$$PE<sub>(pos,2i)</sub> = \cos({\dfrac{pos}{10000^{\dfrac{2i}{d_{model}}}}})$$
|
||||
|
||||
## Implementation
|
||||
### Theory
|
||||
Text is converted to numerical representations called tokens, and each token is converted into a vector via looking up from a word embedding table.
|
||||
At each layer, each token is then contextualized within the scope of the context window with other tokens via a parallel multi-head attention mechanism
|
||||
allowing the signal for key tokens to be amplified and less important tokens to be diminished.
|
||||
|
||||
The transformer uses an encoder-decoder architecture. The encoder extracts features from an input sentence, and the decoder uses the features to produce an output sentence. Some architectures use full encoders and decoders, autoregressive encoders and decoders, or combination of both. This depends on the usage and context of the input.
|
||||
|
||||
### Tensorflow
|
||||
TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks. It was developed by the Google Brain team for Google's internal use in research and production.
|
||||
|
||||
Tensorflow provides the transformer encoder and decoder block that can be implemented by the specification of the user. Although, the transformer is not provided as a standalone to be imported and executed, the user has to create the model first. They also have a tutorial on how to implement the transformer from scratch for machine translation and can be found [here](https://www.tensorflow.org/text/tutorials/transformer).
|
||||
|
||||
More information on [encoder](https://www.tensorflow.org/api_docs/python/tfm/nlp/layers/TransformerEncoderBlock) and [decoder](https://www.tensorflow.org/api_docs/python/tfm/nlp/layers/TransformerDecoderBlock) block mentioned in the code.
|
||||
|
||||
Imports:
|
||||
```python
|
||||
import tensorflow as tf
|
||||
import tensorflow_models as tfm
|
||||
```
|
||||
|
||||
Adding word embeddings and positional encoding:
|
||||
```python
|
||||
class PositionalEmbedding(tf.keras.layers.Layer):
|
||||
def __init__(self, vocab_size, d_model):
|
||||
super().__init__()
|
||||
self.d_model = d_model
|
||||
self.embedding = tf.keras.layers.Embedding(vocab_size, d_model, mask_zero=True)
|
||||
self.pos_encoding = tfm.nlp.layers.RelativePositionEmbedding(hidden_size=d_model)
|
||||
|
||||
def compute_mask(self, *args, **kwargs):
|
||||
return self.embedding.compute_mask(*args, **kwargs)
|
||||
|
||||
def call(self, x):
|
||||
length = tf.shape(x)[1]
|
||||
x = self.embedding(x)
|
||||
x = x + self.pos_encoding[tf.newaxis, :length, :]
|
||||
return x
|
||||
```
|
||||
|
||||
Creating the encoder for the transformer:
|
||||
```python
|
||||
class Encoder(tf.keras.layers.Layer):
|
||||
def __init__(self, num_layers, d_model, num_heads,
|
||||
dff, vocab_size, dropout_rate=0.1):
|
||||
super().__init__()
|
||||
|
||||
self.d_model = d_model
|
||||
self.num_layers = num_layers
|
||||
|
||||
self.pos_embedding = PositionalEmbedding(
|
||||
vocab_size=vocab_size, d_model=d_model)
|
||||
|
||||
self.enc_layers = [
|
||||
tfm.nlp.layers.TransformerEncoderBlock(output_last_dim=d_model,
|
||||
num_attention_heads=num_heads,
|
||||
inner_dim=dff,
|
||||
inner_activation="relu",
|
||||
inner_dropout=dropout_rate)
|
||||
for _ in range(num_layers)]
|
||||
self.dropout = tf.keras.layers.Dropout(dropout_rate)
|
||||
|
||||
def call(self, x):
|
||||
x = self.pos_embedding(x, length=2048)
|
||||
x = self.dropout(x)
|
||||
|
||||
for i in range(self.num_layers):
|
||||
x = self.enc_layers[i](x)
|
||||
|
||||
return x
|
||||
```
|
||||
|
||||
Creating the decoder for the transformer:
|
||||
```python
|
||||
class Decoder(tf.keras.layers.Layer):
|
||||
def __init__(self, num_layers, d_model, num_heads, dff, vocab_size,
|
||||
dropout_rate=0.1):
|
||||
super(Decoder, self).__init__()
|
||||
|
||||
self.d_model = d_model
|
||||
self.num_layers = num_layers
|
||||
|
||||
self.pos_embedding = PositionalEmbedding(vocab_size=vocab_size,
|
||||
d_model=d_model)
|
||||
self.dropout = tf.keras.layers.Dropout(dropout_rate)
|
||||
self.dec_layers = [
|
||||
tfm.nlp.layers.TransformerDecoderBlock(num_attention_heads=num_heads,
|
||||
intermediate_size=dff,
|
||||
intermediate_activation="relu",
|
||||
dropout_rate=dropout_rate)
|
||||
for _ in range(num_layers)]
|
||||
|
||||
def call(self, x, context):
|
||||
x = self.pos_embedding(x)
|
||||
x = self.dropout(x)
|
||||
|
||||
for i in range(self.num_layers):
|
||||
x = self.dec_layers[i](x, context)
|
||||
|
||||
return x
|
||||
```
|
||||
|
||||
Combining the encoder and decoder to create the transformer:
|
||||
```python
|
||||
class Transformer(tf.keras.Model):
|
||||
def __init__(self, num_layers, d_model, num_heads, dff,
|
||||
input_vocab_size, target_vocab_size, dropout_rate=0.1):
|
||||
super().__init__()
|
||||
self.encoder = Encoder(num_layers=num_layers, d_model=d_model,
|
||||
num_heads=num_heads, dff=dff,
|
||||
vocab_size=input_vocab_size,
|
||||
dropout_rate=dropout_rate)
|
||||
|
||||
self.decoder = Decoder(num_layers=num_layers, d_model=d_model,
|
||||
num_heads=num_heads, dff=dff,
|
||||
vocab_size=target_vocab_size,
|
||||
dropout_rate=dropout_rate)
|
||||
|
||||
self.final_layer = tf.keras.layers.Dense(target_vocab_size)
|
||||
|
||||
def call(self, inputs):
|
||||
context, x = inputs
|
||||
|
||||
context = self.encoder(context)
|
||||
x = self.decoder(x, context)
|
||||
logits = self.final_layer(x)
|
||||
|
||||
return logits
|
||||
```
|
||||
|
||||
Model initialization that be used for training and inference:
|
||||
```python
|
||||
transformer = Transformer(
|
||||
num_layers=num_layers,
|
||||
d_model=d_model,
|
||||
num_heads=num_heads,
|
||||
dff=dff,
|
||||
input_vocab_size=64,
|
||||
target_vocab_size=64,
|
||||
dropout_rate=dropout_rate
|
||||
)
|
||||
```
|
||||
|
||||
Sample:
|
||||
```python
|
||||
src = tf.random.uniform((64, 40))
|
||||
tgt = tf.random.uniform((64, 50))
|
||||
|
||||
output = transformer((src, tgt))
|
||||
```
|
||||
|
||||
O/P:
|
||||
```
|
||||
<tf.Tensor: shape=(64, 50, 64), dtype=float32, numpy=
|
||||
array([[[ 0.78274703, -1.2312567 , 0.7272992 , ..., 2.1805947 ,
|
||||
1.3511044 , -1.275499 ],
|
||||
[ 0.82658154, -1.2863302 , 0.76494133, ..., 2.39311 ,
|
||||
1.0973787 , -1.3414565 ],
|
||||
[ 0.57013685, -1.3958443 , 1.0213287 , ..., 2.3791933 ,
|
||||
0.58439416, -0.93464035],
|
||||
...,
|
||||
[ 0.82214123, -0.51090807, 0.25897795, ..., 2.1979148 ,
|
||||
1.4126635 , -0.5771998 ],
|
||||
[ 0.6371507 , -0.36584622, 0.40954843, ..., 2.0241373 ,
|
||||
1.6503414 , -0.74359566],
|
||||
[ 0.6739802 , -0.39973688, 0.3338765 , ..., 1.6819229 ,
|
||||
1.7505672 , -1.0763712 ]],
|
||||
|
||||
[[ 0.78274703, -1.2312567 , 0.7272992 , ..., 2.1805947 ,
|
||||
1.3511044 , -1.275499 ],
|
||||
[ 0.82658154, -1.2863302 , 0.76494133, ..., 2.39311 ,
|
||||
1.0973787 , -1.3414565 ],
|
||||
[ 0.57013685, -1.3958443 , 1.0213287 , ..., 2.3791933 ,
|
||||
0.58439416, -0.93464035],
|
||||
...,
|
||||
[ 0.82214123, -0.51090807, 0.25897795, ..., 2.1979148 ,
|
||||
1.4126635 , -0.5771998 ],
|
||||
[ 0.6371507 , -0.36584622, 0.40954843, ..., 2.0241373 ,
|
||||
1.6503414 , -0.74359566],
|
||||
[ 0.6739802 , -0.39973688, 0.3338765 , ..., 1.6819229 ,
|
||||
1.7505672 , -1.0763712 ]],
|
||||
|
||||
[[ 0.78274703, -1.2312567 , 0.7272992 , ..., 2.1805947 ,
|
||||
1.3511044 , -1.275499 ],
|
||||
[ 0.82658154, -1.2863302 , 0.76494133, ..., 2.39311 ,
|
||||
1.0973787 , -1.3414565 ],
|
||||
[ 0.57013685, -1.3958443 , 1.0213287 , ..., 2.3791933 ,
|
||||
0.58439416, -0.93464035],
|
||||
...,
|
||||
[ 0.82214123, -0.51090807, 0.25897795, ..., 2.1979148 ,
|
||||
1.4126635 , -0.5771998 ],
|
||||
[ 0.6371507 , -0.36584622, 0.40954843, ..., 2.0241373 ,
|
||||
1.6503414 , -0.74359566],
|
||||
[ 0.6739802 , -0.39973688, 0.3338765 , ..., 1.6819229 ,
|
||||
1.7505672 , -1.0763712 ]],
|
||||
|
||||
...,
|
||||
|
||||
[[ 0.78274703, -1.2312567 , 0.7272992 , ..., 2.1805947 ,
|
||||
1.3511044 , -1.275499 ],
|
||||
[ 0.82658154, -1.2863302 , 0.76494133, ..., 2.39311 ,
|
||||
1.0973787 , -1.3414565 ],
|
||||
[ 0.57013685, -1.3958443 , 1.0213287 , ..., 2.3791933 ,
|
||||
0.58439416, -0.93464035],
|
||||
...,
|
||||
[ 0.82214123, -0.51090807, 0.25897795, ..., 2.1979148 ,
|
||||
1.4126635 , -0.5771998 ],
|
||||
[ 0.6371507 , -0.36584622, 0.40954843, ..., 2.0241373 ,
|
||||
1.6503414 , -0.74359566],
|
||||
[ 0.6739802 , -0.39973688, 0.3338765 , ..., 1.6819229 ,
|
||||
1.7505672 , -1.0763712 ]],
|
||||
|
||||
[[ 0.78274703, -1.2312567 , 0.7272992 , ..., 2.1805947 ,
|
||||
1.3511044 , -1.275499 ],
|
||||
[ 0.82658154, -1.2863302 , 0.76494133, ..., 2.39311 ,
|
||||
1.0973787 , -1.3414565 ],
|
||||
[ 0.57013685, -1.3958443 , 1.0213287 , ..., 2.3791933 ,
|
||||
0.58439416, -0.93464035],
|
||||
...,
|
||||
[ 0.82214123, -0.51090807, 0.25897795, ..., 2.1979148 ,
|
||||
1.4126635 , -0.5771998 ],
|
||||
[ 0.6371507 , -0.36584622, 0.40954843, ..., 2.0241373 ,
|
||||
1.6503414 , -0.74359566],
|
||||
[ 0.6739802 , -0.39973688, 0.3338765 , ..., 1.6819229 ,
|
||||
1.7505672 , -1.0763712 ]],
|
||||
|
||||
[[ 0.78274703, -1.2312567 , 0.7272992 , ..., 2.1805947 ,
|
||||
1.3511044 , -1.275499 ],
|
||||
[ 0.82658154, -1.2863302 , 0.76494133, ..., 2.39311 ,
|
||||
1.0973787 , -1.3414565 ],
|
||||
[ 0.57013685, -1.3958443 , 1.0213287 , ..., 2.3791933 ,
|
||||
0.58439416, -0.93464035],
|
||||
...,
|
||||
[ 0.82214123, -0.51090807, 0.25897795, ..., 2.1979148 ,
|
||||
1.4126635 , -0.5771998 ],
|
||||
[ 0.6371507 , -0.36584622, 0.40954843, ..., 2.0241373 ,
|
||||
1.6503414 , -0.74359566],
|
||||
[ 0.6739802 , -0.39973688, 0.3338765 , ..., 1.6819229 ,
|
||||
1.7505672 , -1.0763712 ]]], dtype=float32)>
|
||||
```
|
||||
```
|
||||
>>> output.shape
|
||||
TensorShape([64, 50, 64])
|
||||
```
|
||||
|
||||
### PyTorch
|
||||
PyTorch is a machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella.
|
||||
|
||||
Unlike Tensorflow, PyTorch provides the full implementation of the transformer model that can be executed on the go. More information can be found [here](https://pytorch.org/docs/stable/_modules/torch/nn/modules/transformer.html#Transformer). A full implementation of the model can be found [here](https://github.com/pytorch/examples/tree/master/word_language_model).
|
||||
|
||||
Imports:
|
||||
```python
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
```
|
||||
|
||||
Initializing the model:
|
||||
```python
|
||||
transformer = nn.Transformer(nhead=16, num_encoder_layers=8)
|
||||
```
|
||||
|
||||
Sample:
|
||||
```python
|
||||
src = torch.rand((10, 32, 512))
|
||||
tgt = torch.rand((20, 32, 512))
|
||||
|
||||
output = transformer(src, tgt)
|
||||
```
|
||||
|
||||
O/P:
|
||||
```
|
||||
tensor([[[ 0.2938, -0.4824, -0.7816, ..., 0.0742, 0.5162, 0.3632],
|
||||
[-0.0786, -0.5241, 0.6384, ..., 0.3462, -0.0618, 0.9943],
|
||||
[ 0.7827, 0.1067, -0.1637, ..., -1.7730, -0.3322, -0.0029],
|
||||
...,
|
||||
[-0.3202, 0.2341, -0.0896, ..., -0.9714, -0.1251, -0.0711],
|
||||
[-0.1663, -0.5047, -0.0404, ..., -0.9339, 0.3963, 0.1018],
|
||||
[ 1.2834, -0.4400, 0.0486, ..., -0.6876, -0.4752, 0.0180]],
|
||||
|
||||
[[ 0.9869, -0.7384, -1.0704, ..., -0.9417, 1.3279, -0.1665],
|
||||
[ 0.3445, -0.2454, -0.3644, ..., -0.4856, -1.1004, -0.6819],
|
||||
[ 0.7568, -0.3151, -0.5034, ..., -1.2081, -0.7119, 0.3775],
|
||||
...,
|
||||
[-0.0451, -0.7596, 0.0168, ..., -0.8267, -0.3272, 1.0457],
|
||||
[ 0.3150, -0.6588, -0.1840, ..., 0.1822, -0.0653, 0.9053],
|
||||
[ 0.8692, -0.3519, 0.3128, ..., -1.8446, -0.2325, -0.8662]],
|
||||
|
||||
[[ 0.9719, -0.3113, 0.4637, ..., -0.4422, 1.2348, 0.8274],
|
||||
[ 0.3876, -0.9529, -0.7810, ..., -0.5843, -1.1439, -0.3366],
|
||||
[-0.5774, 0.3789, -0.2819, ..., -1.4057, 0.4352, 0.1474],
|
||||
...,
|
||||
[ 0.6899, -0.1146, -0.3297, ..., -1.7059, -0.1750, 0.4203],
|
||||
[ 0.3689, -0.5174, -0.1253, ..., 0.1417, 0.4159, 0.7560],
|
||||
[ 0.5024, -0.7996, 0.1592, ..., -0.8344, -1.1125, 0.4736]],
|
||||
|
||||
...,
|
||||
|
||||
[[ 0.0704, -0.3971, -0.2768, ..., -1.9929, 0.8608, 1.2264],
|
||||
[ 0.4013, -0.0962, -0.0965, ..., -0.4452, -0.8682, -0.4593],
|
||||
[ 0.1656, 0.5224, -0.1723, ..., -1.5785, 0.3219, 1.1507],
|
||||
...,
|
||||
[-0.9443, 0.4653, 0.2936, ..., -0.9840, -0.0142, -0.1595],
|
||||
[-0.6544, -0.3294, -0.0803, ..., 0.1623, -0.5061, 0.9824],
|
||||
[-0.0978, -1.0023, -0.6915, ..., -0.2296, -0.0594, -0.4715]],
|
||||
|
||||
[[ 0.6531, -0.9285, -0.0331, ..., -1.1481, 0.7768, -0.7321],
|
||||
[ 0.3325, -0.6683, -0.6083, ..., -0.4501, 0.2289, 0.3573],
|
||||
[-0.6750, 0.4600, -0.8512, ..., -2.0097, -0.5159, 0.2773],
|
||||
...,
|
||||
[-1.4356, -1.0135, 0.0081, ..., -1.2985, -0.3715, -0.2678],
|
||||
[ 0.0546, -0.2111, -0.0965, ..., -0.3822, -0.4612, 1.6217],
|
||||
[ 0.7700, -0.5309, -0.1754, ..., -2.2807, -0.0320, -1.5551]],
|
||||
|
||||
[[ 0.2399, -0.9659, 0.1086, ..., -1.1756, 0.4063, 0.0615],
|
||||
[-0.2202, -0.7972, -0.5024, ..., -0.9126, -1.5248, 0.2418],
|
||||
[ 0.5215, 0.4540, 0.0036, ..., -0.2135, 0.2145, 0.6638],
|
||||
...,
|
||||
[-0.2190, -0.4967, 0.7149, ..., -0.3324, 0.3502, 1.0624],
|
||||
[-0.0108, -0.9205, -0.1315, ..., -1.0153, 0.2989, 1.1415],
|
||||
[ 1.1284, -0.6560, 0.6755, ..., -1.2157, 0.8580, -0.5022]]],
|
||||
grad_fn=<NativeLayerNormBackward0>)
|
||||
```
|
||||
```
|
||||
>> output.shape
|
||||
torch.Size([20, 32, 512])
|
||||
```
|
||||
|
||||
### HuggingFace
|
||||
Hugging Face, Inc. is a French-American company incorporated under the Delaware General Corporation Law and based in New York City that develops computation tools for building applications using machine learning.
|
||||
|
||||
It has a wide-range of models that can implemented in Tensorflow, PyTorch and other development backends as well. The models are already trained on a dataset and can be pretrained on custom dataset for customized use, according to the user. The information for training the model and loading the pretrained model can be found [here](https://huggingface.co/docs/transformers/en/training).
|
||||
|
||||
In HuggingFace, `pipeline` is used to run inference from the trained model available in the Hub. This is very beginner friendly. The model is downloaded to the local system on running the script before running the inference. It has to be made sure that the model downloaded does not exceed your available data plan.
|
||||
|
||||
Imports:
|
||||
```python
|
||||
from transformers import pipeline
|
||||
```
|
||||
|
||||
Initialization:
|
||||
|
||||
The model used here is BART (large) which was trained on MultiNLI dataset, which consist of sentence paired with its textual entailment.
|
||||
```python
|
||||
classifier = pipeline(model="facebook/bart-large-mnli")
|
||||
```
|
||||
|
||||
Sample:
|
||||
|
||||
The first argument is the sentence which needs to be analyzed. The second argument, `candidate_labels`, is the list of labels which most likely the first argument sentence belongs to. The output dictionary will have a key as `score`, where the highest index is the textual entailment of the sentence with the index of the label in the list.
|
||||
|
||||
```python
|
||||
output = classifier(
|
||||
"I need to leave but later",
|
||||
candidate_labels=["urgent", "not urgent", "sleep"],
|
||||
)
|
||||
```
|
||||
|
||||
O/P:
|
||||
|
||||
```
|
||||
{'sequence': 'I need to leave but later',
|
||||
'labels': ['not urgent', 'urgent', 'sleep'],
|
||||
'scores': [0.8889380097389221, 0.10631518065929413, 0.00474683940410614]}
|
||||
```
|
||||
|
||||
## Application
|
||||
The transformer has had great success in natural language processing (NLP). Many large language models such as GPT-2, GPT-3, GPT-4, Claude, BERT, XLNet, RoBERTa and ChatGPT demonstrate the ability of transformers to perform a wide variety of such NLP-related tasks, and have the potential to find real-world applications.
|
||||
|
||||
These may include:
|
||||
- Machine translation
|
||||
- Document summarization
|
||||
- Text generation
|
||||
- Biological sequence analysis
|
||||
- Computer code generation
|
||||
|
||||
## Bibliography
|
||||
- [Attention Is All You Need](https://arxiv.org/pdf/1706.03762)
|
||||
- [Tensorflow Tutorial](https://www.tensorflow.org/text/tutorials/transformer)
|
||||
- [Tensorflow Models Docs](https://www.tensorflow.org/api_docs/python/tfm/nlp/layers)
|
||||
- [Wikipedia](https://en.wikipedia.org/wiki/Transformer_(deep_learning_architecture))
|
||||
- [HuggingFace](https://huggingface.co/docs/transformers/en/index)
|
||||
- [PyTorch](https://pytorch.org/docs/stable/generated/torch.nn.Transformer.html)
|
|
@ -11,3 +11,6 @@
|
|||
- [Sorting NumPy Arrays](sorting-array.md)
|
||||
- [NumPy Array Iteration](array-iteration.md)
|
||||
- [Concatenation of Arrays](concatenation-of-arrays.md)
|
||||
- [Splitting of Arrays](splitting-arrays.md)
|
||||
- [Universal Functions (Ufunc)](universal-functions.md)
|
||||
- [Statistical Functions on Arrays](statistical-functions.md)
|
||||
|
|
|
@ -0,0 +1,135 @@
|
|||
# Splitting Arrays
|
||||
|
||||
Splitting a NumPy array refers to dividing the array into smaller sub-arrays. This can be done in various ways, along specific rows, columns, or even based on conditions applied to the elements.
|
||||
|
||||
There are several ways to split a NumPy array in Python using different functions. Some of these methods include:
|
||||
|
||||
- Splitting a NumPy array using `numpy.split()`
|
||||
- Splitting a NumPy array using `numpy.array_split()`
|
||||
- Splitting a NumPy array using `numpy.vsplit()`
|
||||
- Splitting a NumPy array using `numpy.hsplit()`
|
||||
- Splitting a NumPy array using `numpy.dsplit()`
|
||||
|
||||
## NumPy split()
|
||||
|
||||
The `numpy.split()` function divides an array into equal parts along a specified axis.
|
||||
|
||||
**Code**
|
||||
```python
|
||||
import numpy as np
|
||||
array = np.array([1,2,3,4,5,6])
|
||||
#Splitting the array into 3 equal parts along axis=0
|
||||
result = np.split(array,3)
|
||||
print(result)
|
||||
```
|
||||
|
||||
**Output**
|
||||
```
|
||||
[array([1, 2]), array([3, 4]), array([5, 6])]
|
||||
```
|
||||
|
||||
## NumPy array_split()
|
||||
|
||||
The `numpy.array_split()` function divides an array into equal or nearly equal sub-arrays. Unlike `numpy.split()`, it allows for uneven splitting, making it useful when the array cannot be evenly divided by the specified number of splits.
|
||||
|
||||
**Code**
|
||||
```python
|
||||
import numpy as np
|
||||
array = np.array([1,2,3,4,5,6,7,8])
|
||||
#Splitting the array into 3 unequal parts along axis=0
|
||||
result = np.array_split(array,3)
|
||||
print(result)
|
||||
```
|
||||
|
||||
**Output**
|
||||
```
|
||||
[array([1, 2, 3]), array([4, 5, 6]), array([7, 8])]
|
||||
```
|
||||
|
||||
## NumPy vsplit()
|
||||
|
||||
The `numpy.vsplit()`, which is vertical splitting (row-wise), divides an array along the vertical axis (axis=0).
|
||||
|
||||
**Code**
|
||||
```python
|
||||
import numpy as np
|
||||
array = np.array([[1, 2, 3],
|
||||
[4, 5, 6],
|
||||
[7, 8, 9],
|
||||
[10, 11, 12]])
|
||||
#Vertically Splitting the array into 2 subarrays along axis=0
|
||||
result = np.vsplit(array,2)
|
||||
print(result)
|
||||
```
|
||||
|
||||
**Output**
|
||||
```
|
||||
[array([[1, 2, 3],
|
||||
[4, 5, 6]]), array([[ 7, 8, 9],
|
||||
[10, 11, 12]])]
|
||||
```
|
||||
|
||||
|
||||
## NumPy hsplit()
|
||||
|
||||
The `numpy.hsplit()`, which is horizontal splitting (column-wise), divides an array along the horizontal axis (axis=1).
|
||||
|
||||
**Code**
|
||||
```python
|
||||
import numpy as np
|
||||
array = np.array([[1, 2, 3, 4],
|
||||
[5, 7, 8, 9],
|
||||
[11,12,13,14]])
|
||||
#Horizontally Splitting the array into 4 subarrays along axis=1
|
||||
result = np.hsplit(array,4)
|
||||
print(result)
|
||||
```
|
||||
|
||||
**Output**
|
||||
```
|
||||
[array([[ 1],
|
||||
[ 5],
|
||||
[11]]), array([[ 2],
|
||||
[ 7],
|
||||
[12]]), array([[ 3],
|
||||
[ 8],
|
||||
[13]]), array([[ 4],
|
||||
[ 9],
|
||||
[14]])]
|
||||
```
|
||||
|
||||
## NumPy dsplit()
|
||||
|
||||
The`numpy.dsplit()` is employed for splitting arrays along the third axis (axis=2), which is applicable for 3D arrays and beyond.
|
||||
|
||||
**Code**
|
||||
```python
|
||||
import numpy as np
|
||||
#3D array
|
||||
array = np.array([[[ 1, 2, 3, 4,],
|
||||
[ 5, 6, 7, 8,],
|
||||
[ 9, 10, 11, 12]],
|
||||
[[13, 14, 15, 16,],
|
||||
[17, 18, 19, 20,],
|
||||
[21, 22, 23, 24]]])
|
||||
#Splitting the array along axis=2
|
||||
result = np.dsplit(array,2)
|
||||
print(result)
|
||||
```
|
||||
|
||||
**Output**
|
||||
```
|
||||
[array([[[ 1, 2],
|
||||
[ 5, 6],
|
||||
[ 9, 10]],
|
||||
|
||||
[[13, 14],
|
||||
[17, 18],
|
||||
[21, 22]]]), array([[[ 3, 4],
|
||||
[ 7, 8],
|
||||
[11, 12]],
|
||||
|
||||
[[15, 16],
|
||||
[19, 20],
|
||||
[23, 24]]])]
|
||||
```
|
|
@ -0,0 +1,154 @@
|
|||
# Statistical Operations on Arrays
|
||||
|
||||
Statistics involves collecting data, analyzing it, and drawing conclusions from the gathered information.
|
||||
|
||||
NumPy provides powerful statistical functions to perform efficient data analysis on arrays, including `minimum`, `maximum`, `mean`, `median`, `variance`, `standard deviation`, and more.
|
||||
|
||||
## Minimum
|
||||
|
||||
In NumPy, the minimum value of an array is the smallest element present.
|
||||
|
||||
The smallest element of an array is calculated using the `np.min()` function.
|
||||
|
||||
**Code**
|
||||
```python
|
||||
import numpy as np
|
||||
array = np.array([100,20,300,400])
|
||||
#Calculating the minimum
|
||||
result = np.min(array)
|
||||
print("Minimum :", result)
|
||||
```
|
||||
|
||||
**Output**
|
||||
```
|
||||
Minimum : 20
|
||||
```
|
||||
|
||||
## Maximum
|
||||
|
||||
In NumPy, the maximum value of an array is the largest element present.
|
||||
|
||||
The largest element of an array is calculated using the `np.max()` function.
|
||||
|
||||
**Code**
|
||||
```python
|
||||
import numpy as np
|
||||
array = np.array([100,20,300,400])
|
||||
#Calculating the maximum
|
||||
result = np.max(array)
|
||||
print("Maximum :", result)
|
||||
```
|
||||
|
||||
**Output**
|
||||
```
|
||||
Maximum : 400
|
||||
```
|
||||
|
||||
## Mean
|
||||
|
||||
The mean value of a NumPy array is the average of all its elements.
|
||||
|
||||
It is calculated by summing all the elements and then dividing by the total number of elements.
|
||||
|
||||
The mean of an array is calculated using the `np.mean()` function.
|
||||
|
||||
**Code**
|
||||
```python
|
||||
import numpy as np
|
||||
array = np.array([10,20,30,40])
|
||||
#Calculating the mean
|
||||
result = np.mean(array)
|
||||
print("Mean :", result)
|
||||
```
|
||||
|
||||
**Output**
|
||||
```
|
||||
Mean : 25.0
|
||||
```
|
||||
|
||||
## Median
|
||||
|
||||
The median value of a NumPy array is the middle value in a sorted array.
|
||||
|
||||
It separates the higher half of the data from the lower half.
|
||||
|
||||
The median of an array is calculated using the `np.median()` function.
|
||||
|
||||
It is important to note that:
|
||||
|
||||
- If the number of elements is `odd`, the median is the middle element.
|
||||
- If the number of elements is `even`, the median is the average of the two middle elements.
|
||||
|
||||
**Code**
|
||||
```python
|
||||
import numpy as np
|
||||
#The number of elements is odd
|
||||
array = np.array([5,6,7,8,9])
|
||||
#Calculating the median
|
||||
result = np.median(array)
|
||||
print("Median :", result)
|
||||
```
|
||||
|
||||
**Output**
|
||||
```
|
||||
Median : 7.0
|
||||
```
|
||||
|
||||
**Code**
|
||||
```python
|
||||
import numpy as np
|
||||
#The number of elements is even
|
||||
array = np.array([1,2,3,4,5,6])
|
||||
#Calculating the median
|
||||
result = np.median(array)
|
||||
print("Median :", result)
|
||||
```
|
||||
|
||||
**Output**
|
||||
```
|
||||
Median : 3.5
|
||||
```
|
||||
|
||||
## Variance
|
||||
|
||||
Variance in a NumPy array measures the spread or dispersion of data points.
|
||||
|
||||
Calculated as the average of the squared differences from the mean.
|
||||
|
||||
The variance of an array is calculated using the `np.var()` function.
|
||||
|
||||
**Code**
|
||||
```python
|
||||
import numpy as np
|
||||
array = np.array([10,70,80,50,30])
|
||||
#Calculating the variance
|
||||
result = np.var(array)
|
||||
print("Variance :", result)
|
||||
```
|
||||
|
||||
**Output**
|
||||
```
|
||||
Variance : 656.0
|
||||
```
|
||||
|
||||
## Standard Deviation
|
||||
|
||||
The standard deviation of a NumPy array measures the amount of variation or dispersion of the elements in the array.
|
||||
|
||||
It is calculated as the square root of the average of the squared differences from the mean, providing insight into how spread out the values are around the mean.
|
||||
|
||||
The standard deviation of an array is calculated using the `np.std()` function.
|
||||
|
||||
**Code**
|
||||
```python
|
||||
import numpy as np
|
||||
array = np.array([25,30,40,55,75,100])
|
||||
#Calculating the standard deviation
|
||||
result = np.std(array)
|
||||
print("Standard Deviation :", result)
|
||||
```
|
||||
|
||||
**Output**
|
||||
```
|
||||
Standard Deviation : 26.365486699260625
|
||||
```
|
|
@ -0,0 +1,130 @@
|
|||
# Universal functions (ufunc)
|
||||
|
||||
---
|
||||
|
||||
A `ufunc`, short for "`universal function`," is a fundamental concept in NumPy, a powerful library for numerical computing in Python. Universal functions are highly optimized, element-wise functions designed to perform operations on data stored in NumPy arrays.
|
||||
|
||||
|
||||
|
||||
## Uses of Ufuncs in NumPy
|
||||
|
||||
Universal functions (ufuncs) in NumPy provide a wide range of functionalities for efficient and powerful numerical computations. Below is a detailed explanation of their uses:
|
||||
|
||||
### 1. **Element-wise Operations**
|
||||
Ufuncs perform operations on each element of the arrays independently.
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
|
||||
A = np.array([1, 2, 3, 4])
|
||||
B = np.array([5, 6, 7, 8])
|
||||
|
||||
# Element-wise addition
|
||||
np.add(A, B) # Output: array([ 6, 8, 10, 12])
|
||||
```
|
||||
|
||||
### 2. **Broadcasting**
|
||||
Ufuncs support broadcasting, allowing operations on arrays with different shapes, making it possible to perform operations without explicitly reshaping arrays.
|
||||
|
||||
```python
|
||||
C = np.array([1, 2, 3])
|
||||
D = np.array([[1], [2], [3]])
|
||||
|
||||
# Broadcasting addition
|
||||
np.add(C, D) # Output: array([[2, 3, 4], [3, 4, 5], [4, 5, 6]])
|
||||
```
|
||||
|
||||
### 3. **Vectorization**
|
||||
Ufuncs are vectorized, meaning they are implemented in low-level C code, allowing for fast execution and avoiding the overhead of Python loops.
|
||||
|
||||
```python
|
||||
# Vectorized square root
|
||||
np.sqrt(A) # Output: array([1., 1.41421356, 1.73205081, 2.])
|
||||
```
|
||||
|
||||
### 4. **Type Flexibility**
|
||||
Ufuncs handle various data types and perform automatic type casting as needed.
|
||||
|
||||
```python
|
||||
E = np.array([1.0, 2.0, 3.0])
|
||||
F = np.array([4, 5, 6])
|
||||
|
||||
# Addition with type casting
|
||||
np.add(E, F) # Output: array([5., 7., 9.])
|
||||
```
|
||||
|
||||
### 5. **Reduction Operations**
|
||||
Ufuncs support reduction operations, such as summing all elements of an array or finding the product of all elements.
|
||||
|
||||
```python
|
||||
# Summing all elements
|
||||
np.add.reduce(A) # Output: 10
|
||||
|
||||
# Product of all elements
|
||||
np.multiply.reduce(A) # Output: 24
|
||||
```
|
||||
|
||||
### 6. **Accumulation Operations**
|
||||
Ufuncs can perform accumulation operations, which keep a running tally of the computation.
|
||||
|
||||
```python
|
||||
# Cumulative sum
|
||||
np.add.accumulate(A) # Output: array([ 1, 3, 6, 10])
|
||||
```
|
||||
|
||||
### 7. **Reduceat Operations**
|
||||
Ufuncs can perform segmented reductions using the `reduceat` method, which applies the ufunc at specified intervals.
|
||||
|
||||
```python
|
||||
G = np.array([0, 1, 2, 3, 4, 5, 6, 7])
|
||||
indices = [0, 2, 5]
|
||||
np.add.reduceat(G, indices) # Output: array([ 1, 9, 18])
|
||||
```
|
||||
|
||||
### 8. **Outer Product**
|
||||
Ufuncs can compute the outer product of two arrays, producing a matrix where each element is the result of applying the ufunc to each pair of elements from the input arrays.
|
||||
|
||||
```python
|
||||
# Outer product
|
||||
np.multiply.outer([1, 2, 3], [4, 5, 6])
|
||||
# Output: array([[ 4, 5, 6],
|
||||
# [ 8, 10, 12],
|
||||
# [12, 15, 18]])
|
||||
```
|
||||
|
||||
### 9. **Out Parameter**
|
||||
Ufuncs can use the `out` parameter to store results in a pre-allocated array, saving memory and improving performance.
|
||||
|
||||
```python
|
||||
result = np.empty_like(A)
|
||||
np.multiply(A, B, out=result) # Output: array([ 5, 12, 21, 32])
|
||||
```
|
||||
|
||||
# Create Your Own Ufunc
|
||||
|
||||
You can create custom ufuncs for specific needs using np.frompyfunc or np.vectorize, allowing Python functions to behave like ufuncs.
|
||||
|
||||
Here, we are using `frompyfunc()` which takes three argument:
|
||||
|
||||
1. function - the name of the function.
|
||||
2. inputs - the number of input (arrays).
|
||||
3. outputs - the number of output arrays.
|
||||
|
||||
```python
|
||||
def my_add(x, y):
|
||||
return x + y
|
||||
|
||||
my_add_ufunc = np.frompyfunc(my_add, 2, 1)
|
||||
my_add_ufunc(A, B) # Output: array([ 6, 8, 10, 12], dtype=object)
|
||||
```
|
||||
# Some Common Ufunc are
|
||||
|
||||
Here are some commonly used ufuncs in NumPy:
|
||||
|
||||
- **Arithmetic**: `np.add`, `np.subtract`, `np.multiply`, `np.divide`
|
||||
- **Trigonometric**: `np.sin`, `np.cos`, `np.tan`
|
||||
- **Exponential and Logarithmic**: `np.exp`, `np.log`, `np.log10`
|
||||
- **Comparison**: `np.maximum`, `np.minimum`, `np.greater`, `np.less`
|
||||
- **Logical**: `np.logical_and`, `np.logical_or`, `np.logical_not`
|
||||
|
||||
For more such Ufunc, address to [Universal functions (ufunc) — NumPy](https://numpy.org/doc/stable/reference/ufuncs.html)
|
Po Szerokość: | Wysokość: | Rozmiar: 2.3 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 12 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 12 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 12 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 17 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 12 KiB |
Po Szerokość: | Wysokość: | Rozmiar: 11 KiB |
|
@ -5,5 +5,6 @@
|
|||
- [Bar Plots in Matplotlib](matplotlib-bar-plots.md)
|
||||
- [Pie Charts in Matplotlib](matplotlib-pie-charts.md)
|
||||
- [Line Charts in Matplotlib](matplotlib-line-plots.md)
|
||||
- [Scatter Plots in Matplotlib](matplotlib-scatter-plot.md)
|
||||
- [Introduction to Seaborn and Installation](seaborn-intro.md)
|
||||
- [Getting started with Seaborn](seaborn-basics.md)
|
||||
|
|
|
@ -0,0 +1,160 @@
|
|||
# Scatter() plot in matplotlib
|
||||
* A scatter plot is a type of data visualization that uses dots to show values for two variables, with one variable on the x-axis and the other on the y-axis. It's useful for identifying relationships, trends, and correlations, as well as spotting clusters and outliers.
|
||||
* The dots on the plot shows how the variables are related. A scatter plot is made with the matplotlib library's `scatter() method`.
|
||||
## Syntax
|
||||
**Here's how to write code for the `scatter() method`:**
|
||||
```
|
||||
matplotlib.pyplot.scatter (x_axis_value, y_axis_value, s = None, c = None, vmin = None, vmax = None, marker = None, cmap = None, alpha = None, linewidths = None, edgecolors = None)
|
||||
|
||||
```
|
||||
## Prerequisites
|
||||
Scatter plots can be created in Python with Matplotlib's pyplot library. To build a Scatter plot, first import matplotlib. It is a standard convention to import Matplotlib's pyplot library as plt.
|
||||
```
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
```
|
||||
## Creating a simple Scatter Plot
|
||||
With Pyplot, you can use the `scatter()` function to draw a scatter plot.
|
||||
|
||||
The `scatter()` function plots one dot for each observation. It needs two arrays of the same length, one for the values of the x-axis, and one for values on the y-axis:
|
||||
```
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
|
||||
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
|
||||
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
|
||||
|
||||
plt.scatter(x, y)
|
||||
plt.show()
|
||||
```
|
||||
|
||||
When executed, this will show the following Scatter plot:
|
||||
|
||||

|
||||
|
||||
## Compare Plots
|
||||
|
||||
In a scatter plot, comparing plots involves examining multiple sets of points to identify differences or similarities in patterns, trends, or correlations between the data sets.
|
||||
|
||||
```
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
|
||||
#day one, the age and speed of 13 cars:
|
||||
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
|
||||
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
|
||||
plt.scatter(x, y)
|
||||
|
||||
#day two, the age and speed of 15 cars:
|
||||
x = np.array([2,2,8,1,15,8,12,9,7,3,11,4,7,14,12])
|
||||
y = np.array([100,105,84,105,90,99,90,95,94,100,79,112,91,80,85])
|
||||
plt.scatter(x, y)
|
||||
|
||||
plt.show()
|
||||
```
|
||||
|
||||
When executed, this will show the following Compare Scatter plot:
|
||||
|
||||

|
||||
|
||||
## Colors in Scatter plot
|
||||
You can set your own color for each scatter plot with the `color` or the `c` argument:
|
||||
|
||||
```
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
|
||||
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
|
||||
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
|
||||
plt.scatter(x, y, color = 'hotpink')
|
||||
|
||||
x = np.array([2,2,8,1,15,8,12,9,7,3,11,4,7,14,12])
|
||||
y = np.array([100,105,84,105,90,99,90,95,94,100,79,112,91,80,85])
|
||||
plt.scatter(x, y, color = '#88c999')
|
||||
|
||||
plt.show()
|
||||
```
|
||||
|
||||
When executed, this will show the following Colors Scatter plot:
|
||||
|
||||

|
||||
|
||||
## Color Each Dot
|
||||
You can even set a specific color for each dot by using an array of colors as value for the `c` argument:
|
||||
|
||||
``Note: You cannot use the `color` argument for this, only the `c` argument.``
|
||||
|
||||
```
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
|
||||
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
|
||||
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
|
||||
colors = np.array(["red","green","blue","yellow","pink","black","orange","purple","beige","brown","gray","cyan","magenta"])
|
||||
|
||||
plt.scatter(x, y, c=colors)
|
||||
|
||||
plt.show()
|
||||
```
|
||||
|
||||
When executed, this will show the following Color Each Dot:
|
||||
|
||||

|
||||
|
||||
## ColorMap
|
||||
The Matplotlib module has a number of available colormaps.
|
||||
|
||||
A colormap is like a list of colors, where each color has a value that ranges from 0 to 100.
|
||||
|
||||
Here is an example of a colormap:
|
||||
|
||||

|
||||
|
||||
This colormap is called 'viridis' and as you can see it ranges from 0, which is a purple color, up to 100, which is a yellow color.
|
||||
|
||||
## How to Use the ColorMap
|
||||
You can specify the colormap with the keyword argument `cmap` with the value of the colormap, in this case `'viridis'` which is one of the built-in colormaps available in Matplotlib.
|
||||
|
||||
In addition you have to create an array with values (from 0 to 100), one value for each point in the scatter plot:
|
||||
|
||||
```
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
|
||||
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
|
||||
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
|
||||
colors = np.array([0, 10, 20, 30, 40, 45, 50, 55, 60, 70, 80, 90, 100])
|
||||
|
||||
plt.scatter(x, y, c=colors, cmap='viridis')
|
||||
|
||||
plt.show()
|
||||
```
|
||||
|
||||
When executed, this will show the following Scatter ColorMap:
|
||||
|
||||

|
||||
|
||||
You can include the colormap in the drawing by including the `plt.colorbar()` statement:
|
||||
|
||||
```
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
|
||||
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
|
||||
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
|
||||
colors = np.array([0, 10, 20, 30, 40, 45, 50, 55, 60, 70, 80, 90, 100])
|
||||
|
||||
plt.scatter(x, y, c=colors, cmap='viridis')
|
||||
|
||||
plt.colorbar()
|
||||
|
||||
plt.show()
|
||||
```
|
||||
|
||||
When executed, this will show the following Scatter ColorMap using `plt.colorbar()`:
|
||||
|
||||

|
||||
|
||||
|
||||
|
||||
|
|
@ -1,4 +1,5 @@
|
|||
# List of sections
|
||||
|
||||
- [Installation of Scipy and its key uses](installation_features.md)
|
||||
- [SciPy Graphs](scipy-graphs.md)
|
||||
|
||||
|
|
|
@ -0,0 +1,165 @@
|
|||
# SciPy Graphs
|
||||
Graphs are also a type of data structure, SciPy provides a module called scipy.sparse.csgraph for working with graphs.
|
||||
|
||||
## Adjacency Matrix
|
||||
An adjacency matrix is a way of representing a graph using a square matrix. In the matrix, the element at the i-th row and j-th column indicates whether there is an edge from vertex
|
||||
i to vertex j.
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
from scipy.sparse import csr_matrix
|
||||
|
||||
adj_matrix = np.array([
|
||||
[0, 1, 0, 0],
|
||||
[1, 0, 1, 0],
|
||||
[0, 1, 0, 1],
|
||||
[0, 0, 1, 0]
|
||||
])
|
||||
|
||||
|
||||
sparse_matrix = csr_matrix(adj_matrix)
|
||||
|
||||
print(sparse_matrix)
|
||||
```
|
||||
|
||||
In this example:
|
||||
|
||||
1. The graph has 4 nodes.
|
||||
2. is an edge between node 0 and node 1, node 1 and node 2, and node 2 and node 3.
|
||||
3. The csr_matrix function converts the dense adjacency matrix into a compressed sparse row (CSR) format, which is efficient for storing large, sparse matrices.
|
||||
|
||||
## Floyd Warshall
|
||||
|
||||
The Floyd-Warshall algorithm is a classic algorithm used to find the shortest paths between all pairs of nodes in a weighted graph.
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
from scipy.sparse.csgraph import floyd_warshall
|
||||
from scipy.sparse import csr_matrix
|
||||
|
||||
arr = np.array([
|
||||
[0, 1, 2],
|
||||
[1, 0, 0],
|
||||
[2, 0, 0]
|
||||
])
|
||||
|
||||
newarr = csr_matrix(arr)
|
||||
|
||||
print(floyd_warshall(newarr, return_predecessors=True))
|
||||
```
|
||||
|
||||
#### Output
|
||||
|
||||
```
|
||||
(array([[0., 1., 2.],
|
||||
[1., 0., 3.],
|
||||
[2., 3., 0.]]), array([[-9999, 0, 0],
|
||||
[ 1, -9999, 0],
|
||||
[ 2, 0, -9999]], dtype=int32))
|
||||
```
|
||||
|
||||
## Dijkstra
|
||||
|
||||
Dijkstra's algorithm is used to find the shortest path from a source node to all other nodes in a graph with non-negative edge weights.
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
from scipy.sparse.csgraph import dijkstra
|
||||
from scipy.sparse import csr_matrix
|
||||
|
||||
arr = np.array([
|
||||
[0, 1, 2],
|
||||
[1, 0, 0],
|
||||
[2, 0, 0]
|
||||
])
|
||||
|
||||
newarr = csr_matrix(arr)
|
||||
|
||||
print(dijkstra(newarr, return_predecessors=True, indices=0))
|
||||
```
|
||||
|
||||
#### Output
|
||||
|
||||
```
|
||||
(array([ 0., 1., 2.]), array([-9999, 0, 0], dtype=int32))
|
||||
```
|
||||
|
||||
## Bellman Ford
|
||||
|
||||
The Bellman-Ford algorithm is used to find the shortest path from a single source vertex to all other vertices in a weighted graph. It can handle graphs with negative weights, and it also detects negative weight cycles.
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
from scipy.sparse.csgraph import bellman_ford
|
||||
from scipy.sparse import csr_matrix
|
||||
|
||||
arr = np.array([
|
||||
[0, -1, 2],
|
||||
[1, 0, 0],
|
||||
[2, 0, 0]
|
||||
])
|
||||
|
||||
newarr = csr_matrix(arr)
|
||||
|
||||
print(bellman_ford(newarr, return_predecessors=True, indices=0))
|
||||
```
|
||||
|
||||
#### Output
|
||||
|
||||
```
|
||||
(array([ 0., -1., 2.]), array([-9999, 0, 0], dtype=int32))
|
||||
```
|
||||
|
||||
## Depth First Order
|
||||
|
||||
Depth-First Search (DFS) is an algorithm for traversing or searching tree or graph data structures. The algorithm starts at the root and explores as far as possible along each branch before backtracking.
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
from scipy.sparse.csgraph import depth_first_order
|
||||
from scipy.sparse import csr_matrix
|
||||
|
||||
arr = np.array([
|
||||
[0, 1, 0, 1],
|
||||
[1, 1, 1, 1],
|
||||
[2, 1, 1, 0],
|
||||
[0, 1, 0, 1]
|
||||
])
|
||||
|
||||
newarr = csr_matrix(arr)
|
||||
|
||||
print(depth_first_order(newarr, 1))
|
||||
```
|
||||
|
||||
#### Output
|
||||
|
||||
```
|
||||
(array([1, 0, 3, 2], dtype=int32), array([ 1, -9999, 1, 0], dtype=int32))
|
||||
```
|
||||
|
||||
## Breadth First Order
|
||||
|
||||
Breadth-First Search (BFS) is an algorithm for traversing or searching tree or graph data structures. It starts at the root present depth level before moving on to nodes at the next depth level.
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
from scipy.sparse.csgraph import breadth_first_order
|
||||
from scipy.sparse import csr_matrix
|
||||
|
||||
arr = np.array([
|
||||
[0, 1, 0, 1],
|
||||
[1, 1, 1, 1],
|
||||
[2, 1, 1, 0],
|
||||
[0, 1, 0, 1]
|
||||
])
|
||||
|
||||
newarr = csr_matrix(arr)
|
||||
|
||||
print(breadth_first_order(newarr, 1))
|
||||
```
|
||||
|
||||
### Output
|
||||
|
||||
```
|
||||
(array([1, 0, 2, 3], dtype=int32), array([ 1, -9999, 1, 1], dtype=int32))
|
||||
```
|