Merge branch 'main' into main

pull/891/head
Ashita Prasad 2024-06-08 11:04:22 +05:30 zatwierdzone przez GitHub
commit 2477c6ea79
Nie znaleziono w bazie danych klucza dla tego podpisu
ID klucza GPG: B5690EEEBB952194
7 zmienionych plików z 784 dodań i 0 usunięć

Wyświetl plik

@ -0,0 +1,75 @@
# Understanding the `eval` Function in Python
## Introduction
The `eval` function in Python allows you to execute a string-based Python expression dynamically. This can be useful in various scenarios where you need to evaluate expressions that are not known until runtime.
## Syntax
```python
eval(expression, globals=None, locals=None)
```
### Parameters:
* expression: String is parsed and evaluated as a Python expression
* globals [optional]: Dictionary to specify the available global methods and variables.
* locals [optional]: Another dictionary to specify the available local methods and variables.
## Examples
Example 1:
```python
result = eval('2 + 3 * 4')
print(result) # Output: 14
```
Example 2:
```python
x = 10
expression = 'x * 2'
result = eval(expression, {'x': x})
print(result) # Output: 20
```
Example 3:
```python
x = 10
def multiply(a, b):
return a * b
expression = 'multiply(x, 5) + 2'
result = eval(expression)
print("Result:",result) # Output: Result:52
```
Example 4:
```python
expression = input("Enter a Python expression: ")
result = eval(expression)
print("Result:", result)
#input= "3+2"
#Output: Result:5
```
Example 5:
```python
import numpy as np
a=np.random.randint(1,9)
b=np.random.randint(1,9)
operations=["*","-","+"]
op=np.random.choice(operations)
expression=str(a)+op+str(b)
correct_answer=eval(expression)
given_answer=int(input(str(a)+" "+op+" "+str(b)+" = "))
if given_answer==correct_answer:
print("Correct")
else:
print("Incorrect")
print("correct answer is :" ,correct_answer)
#2 * 1 = 8
#Incorrect
#correct answer is : 2
#or
#3 * 2 = 6
#Correct
```
## Conclusion
The eval function is a powerful tool in Python that allows for dynamic evaluation of expressions.

Wyświetl plik

@ -11,3 +11,4 @@
- [Exception Handling in Python](exception-handling.md)
- [Generators](generators.md)
- [List Comprehension](list-comprehension.md)
- [Eval Function](eval_function.md)

Plik binarny nie jest wyświetlany.

Po

Szerokość:  |  Wysokość:  |  Rozmiar: 56 KiB

Wyświetl plik

@ -0,0 +1,140 @@
# Ensemble Learning
Ensemble Learning is a powerful machine learning paradigm that combines multiple models to achieve better performance than any individual model. The idea is to leverage the strengths of different models to improve overall accuracy, robustness, and generalization.
## Introduction
Ensemble Learning is a technique that combines the predictions from multiple machine learning models to make more accurate and robust predictions than a single model. It leverages the diversity of different models to reduce errors and improve performance.
## Types of Ensemble Learning
### Bagging
Bagging, or Bootstrap Aggregating, involves training multiple versions of the same model on different subsets of the training data and averaging their predictions. The most common example of bagging is the `RandomForest` algorithm.
### Boosting
Boosting focuses on training models sequentially, where each new model corrects the errors made by the previous ones. This way, the ensemble learns from its mistakes, leading to improved performance. `AdaBoost` and `Gradient Boosting` are popular examples of boosting algorithms.
### Stacking
Stacking involves training multiple models (the base learners) and a meta-model that combines their predictions. The base learners are trained on the original dataset, while the meta-model is trained on the outputs of the base learners. This approach allows leveraging the strengths of different models.
## Advantages and Disadvantages
### Advantages
- **Improved Accuracy**: Combines the strengths of multiple models.
- **Robustness**: Reduces the risk of overfitting and model bias.
- **Versatility**: Can be applied to various machine learning tasks, including classification and regression.
### Disadvantages
- **Complexity**: More complex than individual models, making interpretation harder.
- **Computational Cost**: Requires more computational resources and training time.
- **Implementation**: Can be challenging to implement and tune effectively.
## Key Concepts
- **Diversity**: The models in the ensemble should be diverse to benefit from their different strengths.
- **Voting/Averaging**: For classification, majority voting is used to combine predictions. For regression, averaging is used.
- **Weighting**: In some ensembles, models are weighted based on their accuracy or other metrics.
## Code Examples
### Bagging with Random Forest
Below is an example of using Random Forest for classification on the Iris dataset.
```python
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
# Load dataset
iris = load_iris()
X, y = iris.data, iris.target
# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Initialize Random Forest model
clf = RandomForestClassifier(n_estimators=100, random_state=42)
# Train the model
clf.fit(X_train, y_train)
# Make predictions
y_pred = clf.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")
print("Classification Report:\n", classification_report(y_test, y_pred))
```
### Boosting with AdaBoost
Below is an example of using AdaBoost for classification on the Iris dataset.
```
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
# Initialize base model
base_model = DecisionTreeClassifier(max_depth=1)
# Initialize AdaBoost model
ada_clf = AdaBoostClassifier(base_estimator=base_model, n_estimators=50, random_state=42)
# Train the model
ada_clf.fit(X_train, y_train)
# Make predictions
y_pred = ada_clf.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")
print("Classification Report:\n", classification_report(y_test, y_pred))
```
### Stacking with Multiple Models
Below is an example of using stacking with multiple models for classification on the Iris dataset.
```
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.ensemble import StackingClassifier
# Define base models
base_models = [
('knn', KNeighborsClassifier(n_neighbors=5)),
('svc', SVC(kernel='linear', probability=True))
]
# Define meta-model
meta_model = LogisticRegression()
# Initialize Stacking model
stacking_clf = StackingClassifier(estimators=base_models, final_estimator=meta_model, cv=5)
# Train the model
stacking_clf.fit(X_train, y_train)
# Make predictions
y_pred = stacking_clf.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")
print("Classification Report:\n", classification_report(y_test, y_pred))
```
## Conclusion
Ensemble Learning is a powerful technique that combines multiple models to improve overall performance. By leveraging the strengths of different models, it provides better accuracy, robustness, and generalization. However, it comes with increased complexity and computational cost. Understanding and implementing ensemble methods can significantly enhance machine learning solutions.

Wyświetl plik

@ -11,9 +11,12 @@
- [Introduction To Convolutional Neural Networks (CNNs)](intro-to-cnn.md)
- [TensorFlow.md](tensorflow.md)
- [PyTorch.md](pytorch.md)
- [Ensemble Learning](ensemble-learning.md)
- [Types of optimizers](types-of-optimizers.md)
- [Logistic Regression](logistic-regression.md)
- [Types_of_Cost_Functions](cost-functions.md)
- [Clustering](clustering.md)
- [Hierarchical Clustering](hierarchical-clustering.md)
- [Grid Search](grid-search.md)
- [Transformers](transformers.md)
- [K-nearest neighbor (KNN)](knn.md)

Wyświetl plik

@ -0,0 +1,122 @@
# K-Nearest Neighbors (KNN) Machine Learning Algorithm in Python
## Introduction
K-Nearest Neighbors (KNN) is a simple, yet powerful, supervised machine learning algorithm used for both classification and regression tasks. It assumes that similar things exist in close proximity. In other words, similar data points are near to each other.
## How KNN Works
KNN works by finding the distances between a query and all the examples in the data, selecting the specified number of examples (K) closest to the query, then voting for the most frequent label (in classification) or averaging the labels (in regression).
### Steps:
1. **Choose the number K of neighbors**
2. **Calculate the distance** between the query-instance and all the training samples
3. **Sort the distances** and determine the nearest neighbors based on the K-th minimum distance
4. **Gather the labels** of the nearest neighbors
5. **Vote for the most frequent label** (in case of classification) or **average the labels** (in case of regression)
## When to Use KNN
### Advantages:
- **Simple and easy to understand:** KNN is intuitive and easy to implement.
- **No training phase:** KNN is a lazy learner, meaning there is no explicit training phase.
- **Effective with a small dataset:** KNN performs well with a small number of input variables.
### Disadvantages:
- **Computationally expensive:** The algorithm becomes significantly slower as the number of examples and/or predictors/independent variables increase.
- **Sensitive to irrelevant features:** All features contribute to the distance equally.
- **Memory-intensive:** Storing all the training data can be costly.
### Use Cases:
- **Recommender Systems:** Suggest items based on similarity to user preferences.
- **Image Recognition:** Classify images by comparing new images to the training set.
- **Finance:** Predict credit risk or fraud detection based on historical data.
## KNN in Python
### Required Libraries
To implement KNN, we need the following Python libraries:
- `numpy`
- `pandas`
- `scikit-learn`
- `matplotlib` (for visualization)
### Installation
```bash
pip install numpy pandas scikit-learn matplotlib
```
### Example Code
Let's implement a simple KNN classifier using the Iris dataset.
#### Step 1: Import Libraries
```python
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
```
#### Step 2: Load Dataset
```python
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target
```
#### Step 3: Split Dataset
```python
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
```
#### Step 4: Train KNN Model
```python
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
```
#### Step 5: Make Predictions
```python
y_pred = knn.predict(X_test)
```
#### Step 6: Evaluate the Model
```python
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')
```
### Visualization (Optional)
```python
# Plotting the decision boundary for visualization (for 2D data)
h = .02 # step size in the mesh
# Create color maps
cmap_light = plt.cm.RdYlBu
cmap_bold = plt.cm.RdYlBu
# For simplicity, we take only the first two features of the dataset
X_plot = X[:, :2]
x_min, x_max = X_plot[:, 0].min() - 1, X_plot[:, 0].max() + 1
y_min, y_max = X_plot[:, 1].min() - 1, y_plot[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
Z = knn.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.figure()
plt.pcolormesh(xx, yy, Z, cmap=cmap_light)
# Plot also the training points
plt.scatter(X_plot[:, 0], X_plot[:, 1], c=y, edgecolor='k', cmap=cmap_bold)
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.title("3-Class classification (k = 3)")
plt.show()
```
## Generalization and Considerations
- **Choosing K:** The choice of K is critical. Smaller values of K can lead to noisy models, while larger values make the algorithm computationally expensive and might oversimplify the model.
- **Feature Scaling:** Since KNN relies on distance calculations, features should be scaled (standardized or normalized) to ensure that all features contribute equally to the distance computation.
- **Distance Metrics:** The choice of distance metric (Euclidean, Manhattan, etc.) can affect the performance of the algorithm.
In conclusion, KNN is a versatile and easy-to-implement algorithm suitable for various classification and regression tasks, particularly when working with small datasets and well-defined features. However, careful consideration should be given to the choice of K, feature scaling, and distance metrics to optimize its performance.

Wyświetl plik

@ -0,0 +1,443 @@
# Transformers
## Introduction
A transformer is a deep learning architecture developed by Google and based on the multi-head attention mechanism. It is based on the softmax-based attention
mechanism. Before transformers, predecessors of attention mechanism were added to gated recurrent neural networks, such as LSTMs and gated recurrent units (GRUs), which processed datasets sequentially. Dependency on previous token computations prevented them from being able to parallelize the attention mechanism.
Transformers are a revolutionary approach to natural language processing (NLP). Unlike older models, they excel at understanding long-range connections between words. This "attention" mechanism lets them grasp the context of a sentence, making them powerful for tasks like machine translation, text summarization, and question answering. Introduced in 2017, transformers are now the backbone of many large language models, including tools you might use every day. Their ability to handle complex relationships in language is fueling advancements in AI across various fields.
## Model Architecture
![Model Architecture](assets/transformer-architecture.png)
Source: [Attention Is All You Need](https://arxiv.org/pdf/1706.03762)
### Encoder
The encoder is composed of a stack of identical layers. Each layer has two sub-layers. The first is a multi-head self-attention mechanism, and the second is a simple, positionwise fully connected feed-forward network. Each encoder consists of two major components: a self-attention mechanism and a feed-forward neural network. The self-attention mechanism accepts input encodings from the previous encoder and weights their relevance to each other to generate output encodings. The feed-forward neural network further processes each output encoding individually. These output encodings are then passed to the next encoder as its input, as well as to the decoders.
### Decoder
The decoder is also composed of a stack of identical layers. In addition to the two sub-layers in each encoder layer, the decoder inserts a third sub-layer, which performs multi-head attention over the output of the encoder stack. The decoder functions in a similar fashion to the encoder, but an additional attention mechanism is inserted which instead draws relevant information from the encodings generated by the encoders. This mechanism can also be called the encoder-decoder attention.
### Attention
#### Scaled Dot-Product Attention
The input consists of queries and keys of dimension $d_k$ , and values of dimension $d_v$. We compute the dot products of the query with all keys, divide each by $\sqrt {d_k}$ , and apply a softmax function to obtain the weights on the values.
$$Attention(Q, K, V) = softmax(\dfrac{QK^T}{\sqrt{d_k}}) \times V$$
#### Multi-Head Attention
Instead of performing a single attention function with $d_{model}$-dimensional keys, values and queries, it is beneficial to linearly project the queries, keys and values h times with different, learned linear projections to $d_k$ , $d_k$ and $d_v$ dimensions, respectively.
Multi-head attention allows the model to jointly attend to information from different representation
subspaces at different positions. With a single attention head, averaging inhibits this.
$$MultiHead(Q, K, V) = Concat(head_1, _{...}, head_h) \times W^O$$
where,
$$head_i = Attention(QW_i^Q, KW_i^K, VW_i^V)$$
where the projections are parameter matrices.
#### Masked Attention
It may be necessary to cut out attention links between some word-pairs. For example, the decoder for token position
$t$ should not have access to token position $t+1$.
$$MaskedAttention(Q, K, V) = softmax(M + \dfrac{QK^T}{\sqrt{d_k}}) \times V$$
### Feed-Forward Network
Each of the layers in the encoder and decoder contains a fully connected feed-forward network, which is applied to each position separately and identically. This
consists of two linear transformations with a ReLU activation in between.
$$FFN(x) = max(0, xW_1 + b_1)W_2 + b_2$$
### Positional Encoding
A positional encoding is a fixed-size vector representation that encapsulates the relative positions of tokens within a target sequence: it provides the transformer model with information about where the words are in the input sequence.
The sine and cosine functions of different frequencies:
$$PE<sub>(pos,2i)</sub> = \sin({\dfrac{pos}{10000^{\dfrac{2i}{d_{model}}}}})$$
$$PE<sub>(pos,2i)</sub> = \cos({\dfrac{pos}{10000^{\dfrac{2i}{d_{model}}}}})$$
## Implementation
### Theory
Text is converted to numerical representations called tokens, and each token is converted into a vector via looking up from a word embedding table.
At each layer, each token is then contextualized within the scope of the context window with other tokens via a parallel multi-head attention mechanism
allowing the signal for key tokens to be amplified and less important tokens to be diminished.
The transformer uses an encoder-decoder architecture. The encoder extracts features from an input sentence, and the decoder uses the features to produce an output sentence. Some architectures use full encoders and decoders, autoregressive encoders and decoders, or combination of both. This depends on the usage and context of the input.
### Tensorflow
TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks. It was developed by the Google Brain team for Google's internal use in research and production.
Tensorflow provides the transformer encoder and decoder block that can be implemented by the specification of the user. Although, the transformer is not provided as a standalone to be imported and executed, the user has to create the model first. They also have a tutorial on how to implement the transformer from scratch for machine translation and can be found [here](https://www.tensorflow.org/text/tutorials/transformer).
More information on [encoder](https://www.tensorflow.org/api_docs/python/tfm/nlp/layers/TransformerEncoderBlock) and [decoder](https://www.tensorflow.org/api_docs/python/tfm/nlp/layers/TransformerDecoderBlock) block mentioned in the code.
Imports:
```python
import tensorflow as tf
import tensorflow_models as tfm
```
Adding word embeddings and positional encoding:
```python
class PositionalEmbedding(tf.keras.layers.Layer):
def __init__(self, vocab_size, d_model):
super().__init__()
self.d_model = d_model
self.embedding = tf.keras.layers.Embedding(vocab_size, d_model, mask_zero=True)
self.pos_encoding = tfm.nlp.layers.RelativePositionEmbedding(hidden_size=d_model)
def compute_mask(self, *args, **kwargs):
return self.embedding.compute_mask(*args, **kwargs)
def call(self, x):
length = tf.shape(x)[1]
x = self.embedding(x)
x = x + self.pos_encoding[tf.newaxis, :length, :]
return x
```
Creating the encoder for the transformer:
```python
class Encoder(tf.keras.layers.Layer):
def __init__(self, num_layers, d_model, num_heads,
dff, vocab_size, dropout_rate=0.1):
super().__init__()
self.d_model = d_model
self.num_layers = num_layers
self.pos_embedding = PositionalEmbedding(
vocab_size=vocab_size, d_model=d_model)
self.enc_layers = [
tfm.nlp.layers.TransformerEncoderBlock(output_last_dim=d_model,
num_attention_heads=num_heads,
inner_dim=dff,
inner_activation="relu",
inner_dropout=dropout_rate)
for _ in range(num_layers)]
self.dropout = tf.keras.layers.Dropout(dropout_rate)
def call(self, x):
x = self.pos_embedding(x, length=2048)
x = self.dropout(x)
for i in range(self.num_layers):
x = self.enc_layers[i](x)
return x
```
Creating the decoder for the transformer:
```python
class Decoder(tf.keras.layers.Layer):
def __init__(self, num_layers, d_model, num_heads, dff, vocab_size,
dropout_rate=0.1):
super(Decoder, self).__init__()
self.d_model = d_model
self.num_layers = num_layers
self.pos_embedding = PositionalEmbedding(vocab_size=vocab_size,
d_model=d_model)
self.dropout = tf.keras.layers.Dropout(dropout_rate)
self.dec_layers = [
tfm.nlp.layers.TransformerDecoderBlock(num_attention_heads=num_heads,
intermediate_size=dff,
intermediate_activation="relu",
dropout_rate=dropout_rate)
for _ in range(num_layers)]
def call(self, x, context):
x = self.pos_embedding(x)
x = self.dropout(x)
for i in range(self.num_layers):
x = self.dec_layers[i](x, context)
return x
```
Combining the encoder and decoder to create the transformer:
```python
class Transformer(tf.keras.Model):
def __init__(self, num_layers, d_model, num_heads, dff,
input_vocab_size, target_vocab_size, dropout_rate=0.1):
super().__init__()
self.encoder = Encoder(num_layers=num_layers, d_model=d_model,
num_heads=num_heads, dff=dff,
vocab_size=input_vocab_size,
dropout_rate=dropout_rate)
self.decoder = Decoder(num_layers=num_layers, d_model=d_model,
num_heads=num_heads, dff=dff,
vocab_size=target_vocab_size,
dropout_rate=dropout_rate)
self.final_layer = tf.keras.layers.Dense(target_vocab_size)
def call(self, inputs):
context, x = inputs
context = self.encoder(context)
x = self.decoder(x, context)
logits = self.final_layer(x)
return logits
```
Model initialization that be used for training and inference:
```python
transformer = Transformer(
num_layers=num_layers,
d_model=d_model,
num_heads=num_heads,
dff=dff,
input_vocab_size=64,
target_vocab_size=64,
dropout_rate=dropout_rate
)
```
Sample:
```python
src = tf.random.uniform((64, 40))
tgt = tf.random.uniform((64, 50))
output = transformer((src, tgt))
```
O/P:
```
<tf.Tensor: shape=(64, 50, 64), dtype=float32, numpy=
array([[[ 0.78274703, -1.2312567 , 0.7272992 , ..., 2.1805947 ,
1.3511044 , -1.275499 ],
[ 0.82658154, -1.2863302 , 0.76494133, ..., 2.39311 ,
1.0973787 , -1.3414565 ],
[ 0.57013685, -1.3958443 , 1.0213287 , ..., 2.3791933 ,
0.58439416, -0.93464035],
...,
[ 0.82214123, -0.51090807, 0.25897795, ..., 2.1979148 ,
1.4126635 , -0.5771998 ],
[ 0.6371507 , -0.36584622, 0.40954843, ..., 2.0241373 ,
1.6503414 , -0.74359566],
[ 0.6739802 , -0.39973688, 0.3338765 , ..., 1.6819229 ,
1.7505672 , -1.0763712 ]],
[[ 0.78274703, -1.2312567 , 0.7272992 , ..., 2.1805947 ,
1.3511044 , -1.275499 ],
[ 0.82658154, -1.2863302 , 0.76494133, ..., 2.39311 ,
1.0973787 , -1.3414565 ],
[ 0.57013685, -1.3958443 , 1.0213287 , ..., 2.3791933 ,
0.58439416, -0.93464035],
...,
[ 0.82214123, -0.51090807, 0.25897795, ..., 2.1979148 ,
1.4126635 , -0.5771998 ],
[ 0.6371507 , -0.36584622, 0.40954843, ..., 2.0241373 ,
1.6503414 , -0.74359566],
[ 0.6739802 , -0.39973688, 0.3338765 , ..., 1.6819229 ,
1.7505672 , -1.0763712 ]],
[[ 0.78274703, -1.2312567 , 0.7272992 , ..., 2.1805947 ,
1.3511044 , -1.275499 ],
[ 0.82658154, -1.2863302 , 0.76494133, ..., 2.39311 ,
1.0973787 , -1.3414565 ],
[ 0.57013685, -1.3958443 , 1.0213287 , ..., 2.3791933 ,
0.58439416, -0.93464035],
...,
[ 0.82214123, -0.51090807, 0.25897795, ..., 2.1979148 ,
1.4126635 , -0.5771998 ],
[ 0.6371507 , -0.36584622, 0.40954843, ..., 2.0241373 ,
1.6503414 , -0.74359566],
[ 0.6739802 , -0.39973688, 0.3338765 , ..., 1.6819229 ,
1.7505672 , -1.0763712 ]],
...,
[[ 0.78274703, -1.2312567 , 0.7272992 , ..., 2.1805947 ,
1.3511044 , -1.275499 ],
[ 0.82658154, -1.2863302 , 0.76494133, ..., 2.39311 ,
1.0973787 , -1.3414565 ],
[ 0.57013685, -1.3958443 , 1.0213287 , ..., 2.3791933 ,
0.58439416, -0.93464035],
...,
[ 0.82214123, -0.51090807, 0.25897795, ..., 2.1979148 ,
1.4126635 , -0.5771998 ],
[ 0.6371507 , -0.36584622, 0.40954843, ..., 2.0241373 ,
1.6503414 , -0.74359566],
[ 0.6739802 , -0.39973688, 0.3338765 , ..., 1.6819229 ,
1.7505672 , -1.0763712 ]],
[[ 0.78274703, -1.2312567 , 0.7272992 , ..., 2.1805947 ,
1.3511044 , -1.275499 ],
[ 0.82658154, -1.2863302 , 0.76494133, ..., 2.39311 ,
1.0973787 , -1.3414565 ],
[ 0.57013685, -1.3958443 , 1.0213287 , ..., 2.3791933 ,
0.58439416, -0.93464035],
...,
[ 0.82214123, -0.51090807, 0.25897795, ..., 2.1979148 ,
1.4126635 , -0.5771998 ],
[ 0.6371507 , -0.36584622, 0.40954843, ..., 2.0241373 ,
1.6503414 , -0.74359566],
[ 0.6739802 , -0.39973688, 0.3338765 , ..., 1.6819229 ,
1.7505672 , -1.0763712 ]],
[[ 0.78274703, -1.2312567 , 0.7272992 , ..., 2.1805947 ,
1.3511044 , -1.275499 ],
[ 0.82658154, -1.2863302 , 0.76494133, ..., 2.39311 ,
1.0973787 , -1.3414565 ],
[ 0.57013685, -1.3958443 , 1.0213287 , ..., 2.3791933 ,
0.58439416, -0.93464035],
...,
[ 0.82214123, -0.51090807, 0.25897795, ..., 2.1979148 ,
1.4126635 , -0.5771998 ],
[ 0.6371507 , -0.36584622, 0.40954843, ..., 2.0241373 ,
1.6503414 , -0.74359566],
[ 0.6739802 , -0.39973688, 0.3338765 , ..., 1.6819229 ,
1.7505672 , -1.0763712 ]]], dtype=float32)>
```
```
>>> output.shape
TensorShape([64, 50, 64])
```
### PyTorch
PyTorch is a machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella.
Unlike Tensorflow, PyTorch provides the full implementation of the transformer model that can be executed on the go. More information can be found [here](https://pytorch.org/docs/stable/_modules/torch/nn/modules/transformer.html#Transformer). A full implementation of the model can be found [here](https://github.com/pytorch/examples/tree/master/word_language_model).
Imports:
```python
import torch
import torch.nn as nn
```
Initializing the model:
```python
transformer = nn.Transformer(nhead=16, num_encoder_layers=8)
```
Sample:
```python
src = torch.rand((10, 32, 512))
tgt = torch.rand((20, 32, 512))
output = transformer(src, tgt)
```
O/P:
```
tensor([[[ 0.2938, -0.4824, -0.7816, ..., 0.0742, 0.5162, 0.3632],
[-0.0786, -0.5241, 0.6384, ..., 0.3462, -0.0618, 0.9943],
[ 0.7827, 0.1067, -0.1637, ..., -1.7730, -0.3322, -0.0029],
...,
[-0.3202, 0.2341, -0.0896, ..., -0.9714, -0.1251, -0.0711],
[-0.1663, -0.5047, -0.0404, ..., -0.9339, 0.3963, 0.1018],
[ 1.2834, -0.4400, 0.0486, ..., -0.6876, -0.4752, 0.0180]],
[[ 0.9869, -0.7384, -1.0704, ..., -0.9417, 1.3279, -0.1665],
[ 0.3445, -0.2454, -0.3644, ..., -0.4856, -1.1004, -0.6819],
[ 0.7568, -0.3151, -0.5034, ..., -1.2081, -0.7119, 0.3775],
...,
[-0.0451, -0.7596, 0.0168, ..., -0.8267, -0.3272, 1.0457],
[ 0.3150, -0.6588, -0.1840, ..., 0.1822, -0.0653, 0.9053],
[ 0.8692, -0.3519, 0.3128, ..., -1.8446, -0.2325, -0.8662]],
[[ 0.9719, -0.3113, 0.4637, ..., -0.4422, 1.2348, 0.8274],
[ 0.3876, -0.9529, -0.7810, ..., -0.5843, -1.1439, -0.3366],
[-0.5774, 0.3789, -0.2819, ..., -1.4057, 0.4352, 0.1474],
...,
[ 0.6899, -0.1146, -0.3297, ..., -1.7059, -0.1750, 0.4203],
[ 0.3689, -0.5174, -0.1253, ..., 0.1417, 0.4159, 0.7560],
[ 0.5024, -0.7996, 0.1592, ..., -0.8344, -1.1125, 0.4736]],
...,
[[ 0.0704, -0.3971, -0.2768, ..., -1.9929, 0.8608, 1.2264],
[ 0.4013, -0.0962, -0.0965, ..., -0.4452, -0.8682, -0.4593],
[ 0.1656, 0.5224, -0.1723, ..., -1.5785, 0.3219, 1.1507],
...,
[-0.9443, 0.4653, 0.2936, ..., -0.9840, -0.0142, -0.1595],
[-0.6544, -0.3294, -0.0803, ..., 0.1623, -0.5061, 0.9824],
[-0.0978, -1.0023, -0.6915, ..., -0.2296, -0.0594, -0.4715]],
[[ 0.6531, -0.9285, -0.0331, ..., -1.1481, 0.7768, -0.7321],
[ 0.3325, -0.6683, -0.6083, ..., -0.4501, 0.2289, 0.3573],
[-0.6750, 0.4600, -0.8512, ..., -2.0097, -0.5159, 0.2773],
...,
[-1.4356, -1.0135, 0.0081, ..., -1.2985, -0.3715, -0.2678],
[ 0.0546, -0.2111, -0.0965, ..., -0.3822, -0.4612, 1.6217],
[ 0.7700, -0.5309, -0.1754, ..., -2.2807, -0.0320, -1.5551]],
[[ 0.2399, -0.9659, 0.1086, ..., -1.1756, 0.4063, 0.0615],
[-0.2202, -0.7972, -0.5024, ..., -0.9126, -1.5248, 0.2418],
[ 0.5215, 0.4540, 0.0036, ..., -0.2135, 0.2145, 0.6638],
...,
[-0.2190, -0.4967, 0.7149, ..., -0.3324, 0.3502, 1.0624],
[-0.0108, -0.9205, -0.1315, ..., -1.0153, 0.2989, 1.1415],
[ 1.1284, -0.6560, 0.6755, ..., -1.2157, 0.8580, -0.5022]]],
grad_fn=<NativeLayerNormBackward0>)
```
```
>> output.shape
torch.Size([20, 32, 512])
```
### HuggingFace
Hugging Face, Inc. is a French-American company incorporated under the Delaware General Corporation Law and based in New York City that develops computation tools for building applications using machine learning.
It has a wide-range of models that can implemented in Tensorflow, PyTorch and other development backends as well. The models are already trained on a dataset and can be pretrained on custom dataset for customized use, according to the user. The information for training the model and loading the pretrained model can be found [here](https://huggingface.co/docs/transformers/en/training).
In HuggingFace, `pipeline` is used to run inference from the trained model available in the Hub. This is very beginner friendly. The model is downloaded to the local system on running the script before running the inference. It has to be made sure that the model downloaded does not exceed your available data plan.
Imports:
```python
from transformers import pipeline
```
Initialization:
The model used here is BART (large) which was trained on MultiNLI dataset, which consist of sentence paired with its textual entailment.
```python
classifier = pipeline(model="facebook/bart-large-mnli")
```
Sample:
The first argument is the sentence which needs to be analyzed. The second argument, `candidate_labels`, is the list of labels which most likely the first argument sentence belongs to. The output dictionary will have a key as `score`, where the highest index is the textual entailment of the sentence with the index of the label in the list.
```python
output = classifier(
"I need to leave but later",
candidate_labels=["urgent", "not urgent", "sleep"],
)
```
O/P:
```
{'sequence': 'I need to leave but later',
'labels': ['not urgent', 'urgent', 'sleep'],
'scores': [0.8889380097389221, 0.10631518065929413, 0.00474683940410614]}
```
## Application
The transformer has had great success in natural language processing (NLP). Many large language models such as GPT-2, GPT-3, GPT-4, Claude, BERT, XLNet, RoBERTa and ChatGPT demonstrate the ability of transformers to perform a wide variety of such NLP-related tasks, and have the potential to find real-world applications.
These may include:
- Machine translation
- Document summarization
- Text generation
- Biological sequence analysis
- Computer code generation
## Bibliography
- [Attention Is All You Need](https://arxiv.org/pdf/1706.03762)
- [Tensorflow Tutorial](https://www.tensorflow.org/text/tutorials/transformer)
- [Tensorflow Models Docs](https://www.tensorflow.org/api_docs/python/tfm/nlp/layers)
- [Wikipedia](https://en.wikipedia.org/wiki/Transformer_(deep_learning_architecture))
- [HuggingFace](https://huggingface.co/docs/transformers/en/index)
- [PyTorch](https://pytorch.org/docs/stable/generated/torch.nn.Transformer.html)