kopia lustrzana https://github.com/animator/learn-python
Merge branch 'animator:main' into deque
commit
aa0eb52c68
|
@ -20,3 +20,4 @@
|
|||
- [Eval Function](eval_function.md)
|
||||
- [Magic Methods](magic-methods.md)
|
||||
- [Asynchronous Context Managers & Generators](asynchronous-context-managers-generators.md)
|
||||
- [Threading](threading.md)
|
||||
|
|
|
@ -0,0 +1,198 @@
|
|||
# Threading in Python
|
||||
Threading is a sequence of instructions in a program that can be executed independently of the remaining process and
|
||||
Threads are like lightweight processes that share the same memory space but can execute independently.
|
||||
The process is an executable instance of a computer program.
|
||||
This guide provides an overview of the threading module and its key functionalities.
|
||||
|
||||
## Key Characteristics of Threads:
|
||||
* Shared Memory: All threads within a process share the same memory space, which allows for efficient communication between threads.
|
||||
* Independent Execution: Each thread can run independently and concurrently.
|
||||
* Context Switching: The operating system can switch between threads, enabling concurrent execution.
|
||||
|
||||
## Threading Module
|
||||
This module will allows you to create and manage threads easily. This module includes several functions and classes to work with threads.
|
||||
|
||||
**1. Creating Thread:**
|
||||
To create a thread in Python, you can use the Thread class from the threading module.
|
||||
|
||||
Example:
|
||||
```python
|
||||
import threading
|
||||
|
||||
# Create a thread
|
||||
thread = threading.Thread()
|
||||
|
||||
# Start the thread
|
||||
thread.start()
|
||||
|
||||
# Wait for the thread to complete
|
||||
thread.join()
|
||||
|
||||
print("Thread has finished execution.")
|
||||
```
|
||||
Output :
|
||||
```
|
||||
Thread has finished execution.
|
||||
```
|
||||
**2. Performing Task with Thread:**
|
||||
We can also perform a specific task by thread by giving a function as target and its argument as arg ,as a parameter to Thread object.
|
||||
|
||||
Example:
|
||||
|
||||
```python
|
||||
import threading
|
||||
|
||||
# Define a function that will be executed by the thread
|
||||
def print_numbers(arg):
|
||||
for i in range(arg):
|
||||
print(f"Thread: {i}")
|
||||
# Create a thread
|
||||
thread = threading.Thread(target=print_numbers,args=(5,))
|
||||
|
||||
# Start the thread
|
||||
thread.start()
|
||||
|
||||
# Wait for the thread to complete
|
||||
thread.join()
|
||||
|
||||
print("Thread has finished execution.")
|
||||
```
|
||||
Output :
|
||||
```
|
||||
Thread: 0
|
||||
Thread: 1
|
||||
Thread: 2
|
||||
Thread: 3
|
||||
Thread: 4
|
||||
Thread has finished execution.
|
||||
```
|
||||
**3. Delaying a Task with Thread's Timer Function:**
|
||||
We can set a time for which we want a thread to start. Timer function takes 4 arguments (interval,function,args,kwargs).
|
||||
|
||||
Example:
|
||||
```python
|
||||
import threading
|
||||
|
||||
# Define a function that will be executed by the thread
|
||||
def print_numbers(arg):
|
||||
for i in range(arg):
|
||||
print(f"Thread: {i}")
|
||||
# Create a thread after 3 seconds
|
||||
thread = threading.Timer(3,print_numbers,args=(5,))
|
||||
|
||||
# Start the thread
|
||||
thread.start()
|
||||
|
||||
# Wait for the thread to complete
|
||||
thread.join()
|
||||
|
||||
print("Thread has finished execution.")
|
||||
```
|
||||
Output :
|
||||
```
|
||||
# after three second output will be generated
|
||||
Thread: 0
|
||||
Thread: 1
|
||||
Thread: 2
|
||||
Thread: 3
|
||||
Thread: 4
|
||||
Thread has finished execution.
|
||||
```
|
||||
**4. Creating Multiple Threads**
|
||||
We can create and manage multiple threads to achieve concurrent execution.
|
||||
|
||||
Example:
|
||||
```python
|
||||
import threading
|
||||
|
||||
def print_numbers(thread_name):
|
||||
for i in range(5):
|
||||
print(f"{thread_name}: {i}")
|
||||
|
||||
# Create multiple threads
|
||||
thread1 = threading.Thread(target=print_numbers, args=("Thread 1",))
|
||||
thread2 = threading.Thread(target=print_numbers, args=("Thread 2",))
|
||||
|
||||
# Start the threads
|
||||
thread1.start()
|
||||
thread2.start()
|
||||
|
||||
# Wait for both threads to complete
|
||||
thread1.join()
|
||||
thread2.join()
|
||||
|
||||
print("Both threads have finished execution.")
|
||||
```
|
||||
Output :
|
||||
```
|
||||
Thread 1: 0
|
||||
Thread 1: 1
|
||||
Thread 2: 0
|
||||
Thread 1: 2
|
||||
Thread 1: 3
|
||||
Thread 2: 1
|
||||
Thread 2: 2
|
||||
Thread 2: 3
|
||||
Thread 2: 4
|
||||
Thread 1: 4
|
||||
Both threads have finished execution.
|
||||
```
|
||||
|
||||
**5. Thread Synchronization**
|
||||
When we create multiple threads and they access shared resources, there is a risk of race conditions and data corruption. To prevent this, you can use synchronization primitives such as locks.
|
||||
A lock is a synchronization primitive that ensures that only one thread can access a shared resource at a time.
|
||||
|
||||
Example:
|
||||
```Python
|
||||
import threading
|
||||
|
||||
lock = threading.Lock()
|
||||
|
||||
def print_numbers(thread_name):
|
||||
for i in range(10):
|
||||
with lock:
|
||||
print(f"{thread_name}: {i}")
|
||||
|
||||
# Create multiple threads
|
||||
thread1 = threading.Thread(target=print_numbers, args=("Thread 1",))
|
||||
thread2 = threading.Thread(target=print_numbers, args=("Thread 2",))
|
||||
|
||||
# Start the threads
|
||||
thread1.start()
|
||||
thread2.start()
|
||||
|
||||
# Wait for both threads to complete
|
||||
thread1.join()
|
||||
thread2.join()
|
||||
|
||||
print("Both threads have finished execution.")
|
||||
```
|
||||
Output :
|
||||
```
|
||||
Thread 1: 0
|
||||
Thread 1: 1
|
||||
Thread 1: 2
|
||||
Thread 1: 3
|
||||
Thread 1: 4
|
||||
Thread 1: 5
|
||||
Thread 1: 6
|
||||
Thread 1: 7
|
||||
Thread 1: 8
|
||||
Thread 1: 9
|
||||
Thread 2: 0
|
||||
Thread 2: 1
|
||||
Thread 2: 2
|
||||
Thread 2: 3
|
||||
Thread 2: 4
|
||||
Thread 2: 5
|
||||
Thread 2: 6
|
||||
Thread 2: 7
|
||||
Thread 2: 8
|
||||
Thread 2: 9
|
||||
Both threads have finished execution.
|
||||
```
|
||||
|
||||
A ```lock``` object is created using threading.Lock() and The ```with lock``` statement ensures that the lock is acquired before printing and released after printing. This prevents other threads from accessing the print statement simultaneously.
|
||||
|
||||
## Conclusion
|
||||
Threading in Python is a powerful tool for achieving concurrency and improving the performance of I/O-bound tasks. By understanding and implementing threads using the threading module, you can enhance the efficiency of your programs. To prevent race situations and maintain data integrity, keep in mind that thread synchronization must be properly managed.
|
Plik binarny nie jest wyświetlany.
Po Szerokość: | Wysokość: | Rozmiar: 16 KiB |
|
@ -10,8 +10,8 @@
|
|||
- [Support Vector Machine Algorithm](support-vector-machine.md)
|
||||
- [Artificial Neural Network from the Ground Up](ann.md)
|
||||
- [Introduction To Convolutional Neural Networks (CNNs)](intro-to-cnn.md)
|
||||
- [TensorFlow.md](tensorflow.md)
|
||||
- [PyTorch.md](pytorch.md)
|
||||
- [TensorFlow](tensorflow.md)
|
||||
- [PyTorch](pytorch.md)
|
||||
- [Ensemble Learning](ensemble-learning.md)
|
||||
- [Types of optimizers](types-of-optimizers.md)
|
||||
- [Logistic Regression](logistic-regression.md)
|
||||
|
@ -25,3 +25,4 @@
|
|||
- [Naive Bayes](naive-bayes.md)
|
||||
- [Neural network regression](neural-network-regression.md)
|
||||
- [PyTorch Fundamentals](pytorch-fundamentals.md)
|
||||
- [Xgboost](xgboost.md)
|
||||
|
|
|
@ -0,0 +1,92 @@
|
|||
# XGBoost
|
||||
XGBoost is an implementation of gradient boosted decision trees designed for speed and performance.
|
||||
|
||||
## Introduction to Gradient Boosting
|
||||
Gradient boosting is a powerful technique for building predictive models that has seen widespread success in various applications.
|
||||
- **Boosting Concept**: Boosting originated from the idea of modifying weak learners to improve their predictive capability.
|
||||
- **AdaBoost**: The first successful boosting algorithm was Adaptive Boosting (AdaBoost), which utilizes decision stumps as weak learners.
|
||||
- **Gradient Boosting Machines (GBM)**: AdaBoost and related algorithms were later reformulated as Gradient Boosting Machines, casting boosting as a numerical optimization problem.
|
||||
- **Algorithm Elements**:
|
||||
- _Loss function_: Determines the objective to minimize (e.g., cross-entropy for classification, mean squared error for regression).
|
||||
- _Weak learner_: Typically, decision trees are used as weak learners.
|
||||
- _Additive model_: New weak learners are added iteratively to minimize the loss function, correcting the errors of previous models.
|
||||
|
||||
## Introduction to XGBoost
|
||||
- eXtreme Gradient Boosting (XBGoost): a more **regularized form** of Gradient Boosting, as it uses **advanced regularization (L1&L2)**, improving the model’s **generalization capabilities.**
|
||||
- It’s suitable when there is **a large number of training samples and a small number of features**; or when there is **a mixture of categorical and numerical features**.
|
||||
- **Development**: Created by Tianqi Chen, XGBoost is designed for computational speed and model performance.
|
||||
- **Key Features**:
|
||||
- _Speed_: Achieved through careful engineering, including parallelization of tree construction, distributed computing, and cache optimization.
|
||||
- _Support for Variations_: XGBoost supports various techniques and optimizations.
|
||||
- _Out-of-Core Computing_: Can handle very large datasets that don't fit into memory.
|
||||
- **Advantages**:
|
||||
- _Sparse Optimization_: Suitable for datasets with many zero values.
|
||||
- _Regularization_: Implements advanced regularization techniques (L1 and L2), enhancing generalization capabilities.
|
||||
- _Parallel Training_: Utilizes all CPU cores during training for faster processing.
|
||||
- _Multiple Loss Functions_: Supports different loss functions based on the problem type.
|
||||
- _Bagging and Early Stopping_: Additional techniques for improving performance and efficiency.
|
||||
- **Pre-Sorted Decision Tree Algorithm**:
|
||||
1. Features are pre-sorted by their values.
|
||||
2. Traversing segmentation points involves finding the best split point on a feature with a cost of O(#data).
|
||||
3. Data is split into left and right child nodes after finding the split point.
|
||||
4. Pre-sorting allows for accurate split point determination.
|
||||
- **Limitations**:
|
||||
1. Iterative Traversal: Each iteration requires traversing the entire training data multiple times.
|
||||
2. Memory Consumption: Loading the entire training data into memory limits size, while not loading it leads to time-consuming read/write operations.
|
||||
3. Space Consumption: Pre-sorting consumes space, storing feature sorting results and split gain calculations.
|
||||
XGBoosting:
|
||||

|
||||
|
||||
## Develop Your First XGBoost Model
|
||||
This code uses the XGBoost library to train a model on the Iris dataset, splitting the data, setting hyperparameters, training the model, making predictions, and evaluating accuracy, achieving an accuracy score of X on the testing set.
|
||||
|
||||
```python
|
||||
# XGBoost with Iris Dataset
|
||||
# Importing necessary libraries
|
||||
import numpy as np
|
||||
import xgboost as xgb
|
||||
from sklearn.datasets import load_iris
|
||||
from sklearn.model_selection import train_test_split
|
||||
from sklearn.metrics import accuracy_score
|
||||
|
||||
# Loading a sample dataset (Iris dataset)
|
||||
data = load_iris()
|
||||
X = data.data
|
||||
y = data.target
|
||||
|
||||
# Splitting the dataset into training and testing sets
|
||||
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
|
||||
|
||||
# Converting the dataset into DMatrix format
|
||||
dtrain = xgb.DMatrix(X_train, label=y_train)
|
||||
dtest = xgb.DMatrix(X_test, label=y_test)
|
||||
|
||||
# Setting hyperparameters for XGBoost
|
||||
params = {
|
||||
'max_depth': 3,
|
||||
'eta': 0.1,
|
||||
'objective': 'multi:softmax',
|
||||
'num_class': 3
|
||||
}
|
||||
|
||||
# Training the XGBoost model
|
||||
num_round = 50
|
||||
model = xgb.train(params, dtrain, num_round)
|
||||
|
||||
# Making predictions on the testing set
|
||||
y_pred = model.predict(dtest)
|
||||
|
||||
# Evaluating the model
|
||||
accuracy = accuracy_score(y_test, y_pred)
|
||||
print("Accuracy:", accuracy)
|
||||
```
|
||||
|
||||
### Output
|
||||
|
||||
Accuracy: 1.0
|
||||
|
||||
## **Conclusion**
|
||||
XGBoost's focus on speed, performance, and scalability has made it one of the most widely used and powerful predictive modeling algorithms available. Its ability to handle large datasets efficiently, along with its advanced features and optimizations, makes it a valuable tool in machine learning and data science.
|
||||
|
||||
## Reference
|
||||
- [Machine Learning Prediction of Turning Precision Using Optimized XGBoost Model](https://www.mdpi.com/2076-3417/12/15/7739)
|
|
@ -9,3 +9,4 @@
|
|||
- [Working with Date & Time in Pandas](datetime.md)
|
||||
- [Importing and Exporting Data in Pandas](import-export.md)
|
||||
- [Handling Missing Values in Pandas](handling-missing-values.md)
|
||||
- [Pandas Series](pandas-series.md)
|
||||
|
|
|
@ -0,0 +1,317 @@
|
|||
# Pandas Series
|
||||
|
||||
A series is a Panda data structures that represents a one dimensional array-like object containing an array of data and an associated array of data type labels, called index.
|
||||
|
||||
## Creating a Series object:
|
||||
|
||||
### Basic Series
|
||||
To create a basic Series, you can pass a list or array of data to the `pd.Series()` function.
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
s1 = pd.Series([4, 5, 2, 3])
|
||||
print(s1)
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
0 4
|
||||
1 5
|
||||
2 2
|
||||
3 3
|
||||
dtype: int64
|
||||
```
|
||||
|
||||
### Series from a Dictionary
|
||||
|
||||
If you pass a dictionary to `pd.Series()`, the keys become the index and the values become the data of the Series.
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
s2 = pd.Series({'A': 1, 'B': 2, 'C': 3})
|
||||
print(s2)
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
A 1
|
||||
B 2
|
||||
C 3
|
||||
dtype: int64
|
||||
```
|
||||
|
||||
|
||||
## Additional Functionality
|
||||
|
||||
|
||||
### Specifying Data Type and Index
|
||||
You can specify the data type and index while creating a Series.
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
s4 = pd.Series([1, 2, 3], index=['a', 'b', 'c'], dtype='float64')
|
||||
print(s4)
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
a 1.0
|
||||
b 2.0
|
||||
c 3.0
|
||||
dtype: float64
|
||||
```
|
||||
|
||||
### Specifying NaN Values:
|
||||
* Sometimes you need to create a series object of a certain size but you do not have complete data available so in such cases you can fill missing data with a NaN(Not a Number) value.
|
||||
* When you store NaN value in series object, the data type must be floating pont type. Even if you specify an integer type , pandas will promote it to floating point type automatically because NaN is not supported by integer type.
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
s3=pd.Series([1,np.Nan,2])
|
||||
print(s3)
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
0 1.0
|
||||
1 NaN
|
||||
2 2.0
|
||||
dtype: float64
|
||||
```
|
||||
|
||||
|
||||
### Creating Data from Expressions
|
||||
You can create a Series using an expression or function.
|
||||
|
||||
`<series_object>`=np.Series(data=<function|expression>,index=None)
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
a=np.arange(1,5) # [1,2,3,4]
|
||||
s5=pd.Series(data=a**2,index=a)
|
||||
print(s5)
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
1 1
|
||||
2 4
|
||||
3 9
|
||||
4 16
|
||||
dtype: int64
|
||||
```
|
||||
|
||||
## Series Object Attributes
|
||||
|
||||
| **Attribute** | **Description** |
|
||||
|--------------------------|---------------------------------------------------|
|
||||
| `<series>.index` | Array of index of the Series |
|
||||
| `<series>.values` | Array of values of the Series |
|
||||
| `<series>.dtype` | Return the dtype of the data |
|
||||
| `<series>.shape` | Return a tuple representing the shape of the data |
|
||||
| `<series>.ndim` | Return the number of dimensions of the data |
|
||||
| `<series>.size` | Return the number of elements in the data |
|
||||
| `<series>.hasnans` | Return True if there is any NaN in the data |
|
||||
| `<series>.empty` | Return True if the Series object is empty |
|
||||
|
||||
- If you use len() on a series object then it return total number of elements in the series object whereas <series_object>.count() return only the number of non NaN elements.
|
||||
|
||||
## Accessing a Series object and its elements
|
||||
|
||||
### Accessing Individual Elements
|
||||
You can access individual elements using their index.
|
||||
'legal' indexes arte used to access individual element.
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
s7 = pd.Series(data=[13, 45, 67, 89], index=['A', 'B', 'C', 'D'])
|
||||
print(s7['A'])
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
13
|
||||
```
|
||||
|
||||
### Slicing a Series
|
||||
|
||||
- Slices are extracted based on their positional index, regardless of the custom index labels.
|
||||
- Each element in the Series has a positional index starting from 0 (i.e., 0 for the first element, 1 for the second element, and so on).
|
||||
- `<series>[<start>:<end>]` will return the values of the elements between the start and end positions (excluding the end position).
|
||||
|
||||
#### Example
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
s = pd.Series(data=[13, 45, 67, 89], index=['A', 'B', 'C', 'D'])
|
||||
print(s[:2])
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
A 13
|
||||
B 45
|
||||
dtype: int64
|
||||
```
|
||||
|
||||
This example demonstrates that the first two elements (positions 0 and 1) are returned, regardless of their custom index labels.
|
||||
|
||||
## Operation on series object
|
||||
|
||||
### Modifying elements and indexes
|
||||
* <series_object>[indexes]=< new data value >
|
||||
* <series_object>[start : end]=< new data value >
|
||||
* <series_object>.index=[new indexes]
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
s8 = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
|
||||
s8['a'] = 100
|
||||
s8.index = ['x', 'y', 'z']
|
||||
print(s8)
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
x 100
|
||||
y 20
|
||||
z 30
|
||||
dtype: int64
|
||||
```
|
||||
|
||||
**Note: Series object are value-mutable but size immutable objects.**
|
||||
|
||||
### Vector operations
|
||||
We can perform vector operations such as `+`,`-`,`/`,`%` etc.
|
||||
|
||||
#### Addition
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
s9 = pd.Series([1, 2, 3])
|
||||
print(s9 + 5)
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
0 6
|
||||
1 7
|
||||
2 8
|
||||
dtype: int64
|
||||
```
|
||||
|
||||
#### Subtraction
|
||||
```python
|
||||
print(s9 - 2)
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
0 -1
|
||||
1 0
|
||||
2 1
|
||||
dtype: int64
|
||||
```
|
||||
|
||||
### Arthmetic on series object
|
||||
|
||||
#### Addition
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
s10 = pd.Series([1, 2, 3])
|
||||
s11 = pd.Series([4, 5, 6])
|
||||
print(s10 + s11)
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
0 5
|
||||
1 7
|
||||
2 9
|
||||
dtype: int64
|
||||
```
|
||||
|
||||
#### Multiplication
|
||||
|
||||
```python
|
||||
print("s10 * s11)
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
0 4
|
||||
1 10
|
||||
2 18
|
||||
dtype: int64
|
||||
```
|
||||
|
||||
Here one thing we should keep in mind that both the series object should have same indexes otherwise it will return NaN value to all the indexes of two series object .
|
||||
|
||||
|
||||
### Head and Tail Functions
|
||||
|
||||
| **Functions** | **Description** |
|
||||
|--------------------------|---------------------------------------------------|
|
||||
| `<series>.head(n)` | return the first n elements of the series |
|
||||
| `<series>.tail(n)` | return the last n elements of the series |
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
s12 = pd.Series([10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
|
||||
print(s12.head(3))
|
||||
print(s12.tail(3))
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
0 10
|
||||
1 20
|
||||
2 30
|
||||
dtype: int64
|
||||
7 80
|
||||
8 90
|
||||
9 100
|
||||
dtype: int64
|
||||
```
|
||||
|
||||
If you dont provide any value to n the by default it give results for `n=5`.
|
||||
|
||||
### Few extra functions
|
||||
|
||||
| **Function** | **Description** |
|
||||
|----------------------------------------|------------------------------------------------------------------------|
|
||||
| `<series_object>.sort_values()` | Return the Series object in ascending order based on its values. |
|
||||
| `<series_object>.sort_index()` | Return the Series object in ascending order based on its index. |
|
||||
| `<series_object>.sort_drop(<index>)` | Return the Series with the deleted index and its corresponding value. |
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
s13 = pd.Series([3, 1, 2], index=['c', 'a', 'b'])
|
||||
print(s13.sort_values())
|
||||
print(s13.sort_index())
|
||||
print(s13.drop('a'))
|
||||
```
|
||||
|
||||
#### Output
|
||||
```
|
||||
a 1
|
||||
b 2
|
||||
c 3
|
||||
dtype: int64
|
||||
a 1
|
||||
b 2
|
||||
c 3
|
||||
dtype: int64
|
||||
c 3
|
||||
b 2
|
||||
dtype: int64
|
||||
```
|
||||
|
||||
## Conclusion
|
||||
In short, Pandas Series is a fundamental data structure in Python for handling one-dimensional data. It combines an array of values with an index, offering efficient methods for data manipulation and analysis. With its ease of use and powerful functionality, Pandas Series is widely used in data science and analytics for tasks such as data cleaning, exploration, and visualization.
|
Ładowanie…
Reference in New Issue