Merge branch 'animator:main' into deque

pull/1282/head
Mankala Vaishnavi 2024-06-24 13:27:48 +05:30 zatwierdzone przez GitHub
commit aa0eb52c68
Nie znaleziono w bazie danych klucza dla tego podpisu
ID klucza GPG: B5690EEEBB952194
7 zmienionych plików z 612 dodań i 2 usunięć

Wyświetl plik

@ -20,3 +20,4 @@
- [Eval Function](eval_function.md)
- [Magic Methods](magic-methods.md)
- [Asynchronous Context Managers & Generators](asynchronous-context-managers-generators.md)
- [Threading](threading.md)

Wyświetl plik

@ -0,0 +1,198 @@
# Threading in Python
Threading is a sequence of instructions in a program that can be executed independently of the remaining process and
Threads are like lightweight processes that share the same memory space but can execute independently.
The process is an executable instance of a computer program.
This guide provides an overview of the threading module and its key functionalities.
## Key Characteristics of Threads:
* Shared Memory: All threads within a process share the same memory space, which allows for efficient communication between threads.
* Independent Execution: Each thread can run independently and concurrently.
* Context Switching: The operating system can switch between threads, enabling concurrent execution.
## Threading Module
This module will allows you to create and manage threads easily. This module includes several functions and classes to work with threads.
**1. Creating Thread:**
To create a thread in Python, you can use the Thread class from the threading module.
Example:
```python
import threading
# Create a thread
thread = threading.Thread()
# Start the thread
thread.start()
# Wait for the thread to complete
thread.join()
print("Thread has finished execution.")
```
Output :
```
Thread has finished execution.
```
**2. Performing Task with Thread:**
We can also perform a specific task by thread by giving a function as target and its argument as arg ,as a parameter to Thread object.
Example:
```python
import threading
# Define a function that will be executed by the thread
def print_numbers(arg):
for i in range(arg):
print(f"Thread: {i}")
# Create a thread
thread = threading.Thread(target=print_numbers,args=(5,))
# Start the thread
thread.start()
# Wait for the thread to complete
thread.join()
print("Thread has finished execution.")
```
Output :
```
Thread: 0
Thread: 1
Thread: 2
Thread: 3
Thread: 4
Thread has finished execution.
```
**3. Delaying a Task with Thread's Timer Function:**
We can set a time for which we want a thread to start. Timer function takes 4 arguments (interval,function,args,kwargs).
Example:
```python
import threading
# Define a function that will be executed by the thread
def print_numbers(arg):
for i in range(arg):
print(f"Thread: {i}")
# Create a thread after 3 seconds
thread = threading.Timer(3,print_numbers,args=(5,))
# Start the thread
thread.start()
# Wait for the thread to complete
thread.join()
print("Thread has finished execution.")
```
Output :
```
# after three second output will be generated
Thread: 0
Thread: 1
Thread: 2
Thread: 3
Thread: 4
Thread has finished execution.
```
**4. Creating Multiple Threads**
We can create and manage multiple threads to achieve concurrent execution.
Example:
```python
import threading
def print_numbers(thread_name):
for i in range(5):
print(f"{thread_name}: {i}")
# Create multiple threads
thread1 = threading.Thread(target=print_numbers, args=("Thread 1",))
thread2 = threading.Thread(target=print_numbers, args=("Thread 2",))
# Start the threads
thread1.start()
thread2.start()
# Wait for both threads to complete
thread1.join()
thread2.join()
print("Both threads have finished execution.")
```
Output :
```
Thread 1: 0
Thread 1: 1
Thread 2: 0
Thread 1: 2
Thread 1: 3
Thread 2: 1
Thread 2: 2
Thread 2: 3
Thread 2: 4
Thread 1: 4
Both threads have finished execution.
```
**5. Thread Synchronization**
When we create multiple threads and they access shared resources, there is a risk of race conditions and data corruption. To prevent this, you can use synchronization primitives such as locks.
A lock is a synchronization primitive that ensures that only one thread can access a shared resource at a time.
Example:
```Python
import threading
lock = threading.Lock()
def print_numbers(thread_name):
for i in range(10):
with lock:
print(f"{thread_name}: {i}")
# Create multiple threads
thread1 = threading.Thread(target=print_numbers, args=("Thread 1",))
thread2 = threading.Thread(target=print_numbers, args=("Thread 2",))
# Start the threads
thread1.start()
thread2.start()
# Wait for both threads to complete
thread1.join()
thread2.join()
print("Both threads have finished execution.")
```
Output :
```
Thread 1: 0
Thread 1: 1
Thread 1: 2
Thread 1: 3
Thread 1: 4
Thread 1: 5
Thread 1: 6
Thread 1: 7
Thread 1: 8
Thread 1: 9
Thread 2: 0
Thread 2: 1
Thread 2: 2
Thread 2: 3
Thread 2: 4
Thread 2: 5
Thread 2: 6
Thread 2: 7
Thread 2: 8
Thread 2: 9
Both threads have finished execution.
```
A ```lock``` object is created using threading.Lock() and The ```with lock``` statement ensures that the lock is acquired before printing and released after printing. This prevents other threads from accessing the print statement simultaneously.
## Conclusion
Threading in Python is a powerful tool for achieving concurrency and improving the performance of I/O-bound tasks. By understanding and implementing threads using the threading module, you can enhance the efficiency of your programs. To prevent race situations and maintain data integrity, keep in mind that thread synchronization must be properly managed.

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 16 KiB

Wyświetl plik

@ -10,8 +10,8 @@
- [Support Vector Machine Algorithm](support-vector-machine.md)
- [Artificial Neural Network from the Ground Up](ann.md)
- [Introduction To Convolutional Neural Networks (CNNs)](intro-to-cnn.md)
- [TensorFlow.md](tensorflow.md)
- [PyTorch.md](pytorch.md)
- [TensorFlow](tensorflow.md)
- [PyTorch](pytorch.md)
- [Ensemble Learning](ensemble-learning.md)
- [Types of optimizers](types-of-optimizers.md)
- [Logistic Regression](logistic-regression.md)
@ -25,3 +25,4 @@
- [Naive Bayes](naive-bayes.md)
- [Neural network regression](neural-network-regression.md)
- [PyTorch Fundamentals](pytorch-fundamentals.md)
- [Xgboost](xgboost.md)

Wyświetl plik

@ -0,0 +1,92 @@
# XGBoost
XGBoost is an implementation of gradient boosted decision trees designed for speed and performance.
## Introduction to Gradient Boosting
Gradient boosting is a powerful technique for building predictive models that has seen widespread success in various applications.
- **Boosting Concept**: Boosting originated from the idea of modifying weak learners to improve their predictive capability.
- **AdaBoost**: The first successful boosting algorithm was Adaptive Boosting (AdaBoost), which utilizes decision stumps as weak learners.
- **Gradient Boosting Machines (GBM)**: AdaBoost and related algorithms were later reformulated as Gradient Boosting Machines, casting boosting as a numerical optimization problem.
- **Algorithm Elements**:
- _Loss function_: Determines the objective to minimize (e.g., cross-entropy for classification, mean squared error for regression).
- _Weak learner_: Typically, decision trees are used as weak learners.
- _Additive model_: New weak learners are added iteratively to minimize the loss function, correcting the errors of previous models.
## Introduction to XGBoost
- eXtreme Gradient Boosting (XBGoost): a more **regularized form** of Gradient Boosting, as it uses **advanced regularization (L1&L2)**, improving the models **generalization capabilities.**
- Its suitable when there is **a large number of training samples and a small number of features**; or when there is **a mixture of categorical and numerical features**.
- **Development**: Created by Tianqi Chen, XGBoost is designed for computational speed and model performance.
- **Key Features**:
- _Speed_: Achieved through careful engineering, including parallelization of tree construction, distributed computing, and cache optimization.
- _Support for Variations_: XGBoost supports various techniques and optimizations.
- _Out-of-Core Computing_: Can handle very large datasets that don't fit into memory.
- **Advantages**:
- _Sparse Optimization_: Suitable for datasets with many zero values.
- _Regularization_: Implements advanced regularization techniques (L1 and L2), enhancing generalization capabilities.
- _Parallel Training_: Utilizes all CPU cores during training for faster processing.
- _Multiple Loss Functions_: Supports different loss functions based on the problem type.
- _Bagging and Early Stopping_: Additional techniques for improving performance and efficiency.
- **Pre-Sorted Decision Tree Algorithm**:
1. Features are pre-sorted by their values.
2. Traversing segmentation points involves finding the best split point on a feature with a cost of O(#data).
3. Data is split into left and right child nodes after finding the split point.
4. Pre-sorting allows for accurate split point determination.
- **Limitations**:
1. Iterative Traversal: Each iteration requires traversing the entire training data multiple times.
2. Memory Consumption: Loading the entire training data into memory limits size, while not loading it leads to time-consuming read/write operations.
3. Space Consumption: Pre-sorting consumes space, storing feature sorting results and split gain calculations.
XGBoosting:
![image](assets/XG_1.webp)
## Develop Your First XGBoost Model
This code uses the XGBoost library to train a model on the Iris dataset, splitting the data, setting hyperparameters, training the model, making predictions, and evaluating accuracy, achieving an accuracy score of X on the testing set.
```python
# XGBoost with Iris Dataset
# Importing necessary libraries
import numpy as np
import xgboost as xgb
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Loading a sample dataset (Iris dataset)
data = load_iris()
X = data.data
y = data.target
# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Converting the dataset into DMatrix format
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)
# Setting hyperparameters for XGBoost
params = {
'max_depth': 3,
'eta': 0.1,
'objective': 'multi:softmax',
'num_class': 3
}
# Training the XGBoost model
num_round = 50
model = xgb.train(params, dtrain, num_round)
# Making predictions on the testing set
y_pred = model.predict(dtest)
# Evaluating the model
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
```
### Output
Accuracy: 1.0
## **Conclusion**
XGBoost's focus on speed, performance, and scalability has made it one of the most widely used and powerful predictive modeling algorithms available. Its ability to handle large datasets efficiently, along with its advanced features and optimizations, makes it a valuable tool in machine learning and data science.
## Reference
- [Machine Learning Prediction of Turning Precision Using Optimized XGBoost Model](https://www.mdpi.com/2076-3417/12/15/7739)

Wyświetl plik

@ -9,3 +9,4 @@
- [Working with Date & Time in Pandas](datetime.md)
- [Importing and Exporting Data in Pandas](import-export.md)
- [Handling Missing Values in Pandas](handling-missing-values.md)
- [Pandas Series](pandas-series.md)

Wyświetl plik

@ -0,0 +1,317 @@
# Pandas Series
A series is a Panda data structures that represents a one dimensional array-like object containing an array of data and an associated array of data type labels, called index.
## Creating a Series object:
### Basic Series
To create a basic Series, you can pass a list or array of data to the `pd.Series()` function.
```python
import pandas as pd
s1 = pd.Series([4, 5, 2, 3])
print(s1)
```
#### Output
```
0 4
1 5
2 2
3 3
dtype: int64
```
### Series from a Dictionary
If you pass a dictionary to `pd.Series()`, the keys become the index and the values become the data of the Series.
```python
import pandas as pd
s2 = pd.Series({'A': 1, 'B': 2, 'C': 3})
print(s2)
```
#### Output
```
A 1
B 2
C 3
dtype: int64
```
## Additional Functionality
### Specifying Data Type and Index
You can specify the data type and index while creating a Series.
```python
import pandas as pd
s4 = pd.Series([1, 2, 3], index=['a', 'b', 'c'], dtype='float64')
print(s4)
```
#### Output
```
a 1.0
b 2.0
c 3.0
dtype: float64
```
### Specifying NaN Values:
* Sometimes you need to create a series object of a certain size but you do not have complete data available so in such cases you can fill missing data with a NaN(Not a Number) value.
* When you store NaN value in series object, the data type must be floating pont type. Even if you specify an integer type , pandas will promote it to floating point type automatically because NaN is not supported by integer type.
```python
import pandas as pd
s3=pd.Series([1,np.Nan,2])
print(s3)
```
#### Output
```
0 1.0
1 NaN
2 2.0
dtype: float64
```
### Creating Data from Expressions
You can create a Series using an expression or function.
`<series_object>`=np.Series(data=<function|expression>,index=None)
```python
import pandas as pd
a=np.arange(1,5) # [1,2,3,4]
s5=pd.Series(data=a**2,index=a)
print(s5)
```
#### Output
```
1 1
2 4
3 9
4 16
dtype: int64
```
## Series Object Attributes
| **Attribute** | **Description** |
|--------------------------|---------------------------------------------------|
| `<series>.index` | Array of index of the Series |
| `<series>.values` | Array of values of the Series |
| `<series>.dtype` | Return the dtype of the data |
| `<series>.shape` | Return a tuple representing the shape of the data |
| `<series>.ndim` | Return the number of dimensions of the data |
| `<series>.size` | Return the number of elements in the data |
| `<series>.hasnans` | Return True if there is any NaN in the data |
| `<series>.empty` | Return True if the Series object is empty |
- If you use len() on a series object then it return total number of elements in the series object whereas <series_object>.count() return only the number of non NaN elements.
## Accessing a Series object and its elements
### Accessing Individual Elements
You can access individual elements using their index.
'legal' indexes arte used to access individual element.
```python
import pandas as pd
s7 = pd.Series(data=[13, 45, 67, 89], index=['A', 'B', 'C', 'D'])
print(s7['A'])
```
#### Output
```
13
```
### Slicing a Series
- Slices are extracted based on their positional index, regardless of the custom index labels.
- Each element in the Series has a positional index starting from 0 (i.e., 0 for the first element, 1 for the second element, and so on).
- `<series>[<start>:<end>]` will return the values of the elements between the start and end positions (excluding the end position).
#### Example
```python
import pandas as pd
s = pd.Series(data=[13, 45, 67, 89], index=['A', 'B', 'C', 'D'])
print(s[:2])
```
#### Output
```
A 13
B 45
dtype: int64
```
This example demonstrates that the first two elements (positions 0 and 1) are returned, regardless of their custom index labels.
## Operation on series object
### Modifying elements and indexes
* <series_object>[indexes]=< new data value >
* <series_object>[start : end]=< new data value >
* <series_object>.index=[new indexes]
```python
import pandas as pd
s8 = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
s8['a'] = 100
s8.index = ['x', 'y', 'z']
print(s8)
```
#### Output
```
x 100
y 20
z 30
dtype: int64
```
**Note: Series object are value-mutable but size immutable objects.**
### Vector operations
We can perform vector operations such as `+`,`-`,`/`,`%` etc.
#### Addition
```python
import pandas as pd
s9 = pd.Series([1, 2, 3])
print(s9 + 5)
```
#### Output
```
0 6
1 7
2 8
dtype: int64
```
#### Subtraction
```python
print(s9 - 2)
```
#### Output
```
0 -1
1 0
2 1
dtype: int64
```
### Arthmetic on series object
#### Addition
```python
import pandas as pd
s10 = pd.Series([1, 2, 3])
s11 = pd.Series([4, 5, 6])
print(s10 + s11)
```
#### Output
```
0 5
1 7
2 9
dtype: int64
```
#### Multiplication
```python
print("s10 * s11)
```
#### Output
```
0 4
1 10
2 18
dtype: int64
```
Here one thing we should keep in mind that both the series object should have same indexes otherwise it will return NaN value to all the indexes of two series object .
### Head and Tail Functions
| **Functions** | **Description** |
|--------------------------|---------------------------------------------------|
| `<series>.head(n)` | return the first n elements of the series |
| `<series>.tail(n)` | return the last n elements of the series |
```python
import pandas as pd
s12 = pd.Series([10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
print(s12.head(3))
print(s12.tail(3))
```
#### Output
```
0 10
1 20
2 30
dtype: int64
7 80
8 90
9 100
dtype: int64
```
If you dont provide any value to n the by default it give results for `n=5`.
### Few extra functions
| **Function** | **Description** |
|----------------------------------------|------------------------------------------------------------------------|
| `<series_object>.sort_values()` | Return the Series object in ascending order based on its values. |
| `<series_object>.sort_index()` | Return the Series object in ascending order based on its index. |
| `<series_object>.sort_drop(<index>)` | Return the Series with the deleted index and its corresponding value. |
```python
import pandas as pd
s13 = pd.Series([3, 1, 2], index=['c', 'a', 'b'])
print(s13.sort_values())
print(s13.sort_index())
print(s13.drop('a'))
```
#### Output
```
a 1
b 2
c 3
dtype: int64
a 1
b 2
c 3
dtype: int64
c 3
b 2
dtype: int64
```
## Conclusion
In short, Pandas Series is a fundamental data structure in Python for handling one-dimensional data. It combines an array of values with an index, offering efficient methods for data manipulation and analysis. With its ease of use and powerful functionality, Pandas Series is widely used in data science and analytics for tasks such as data cleaning, exploration, and visualization.