Merge branch 'main' into main

2024-06-08 11:04:22 +05:30 · 2024-06-08 11:04:22 +05:30 · 2477c6ea79
commit 2477c6ea79
--- a/contrib/advanced-python/eval_function.md
+++ b/contrib/advanced-python/eval_function.md
@ -0,0 +1,75 @@
+# Understanding the `eval` Function in Python
+## Introduction
+
+The `eval` function in Python allows you to execute a string-based Python expression dynamically.  This can be useful in various scenarios where you need to evaluate expressions that are not known until runtime.
+
+## Syntax
+```python 
+eval(expression, globals=None, locals=None)
+```
+
+### Parameters:
+
+* expression: String is parsed and evaluated as a Python expression
+* globals [optional]: Dictionary to specify the available global methods and variables.
+* locals [optional]: Another dictionary to specify the available local methods and variables.
+
+## Examples
+Example 1:
+```python
+result = eval('2 + 3 * 4')
+print(result)  # Output: 14
+```
+Example 2:
+
+```python
+x = 10
+expression = 'x * 2'
+result = eval(expression, {'x': x})
+print(result)  # Output: 20
+```
+Example 3:
+```python
+x = 10
+def multiply(a, b):
+    return a * b
+expression = 'multiply(x, 5) + 2'
+result = eval(expression)
+print("Result:",result)  # Output: Result:52
+```
+Example 4:
+```python
+expression = input("Enter a Python expression: ")
+result = eval(expression)
+print("Result:", result)
+#input= "3+2"
+#Output: Result:5
+```
+
+Example 5:
+```python
+import numpy as np
+a=np.random.randint(1,9)
+b=np.random.randint(1,9)
+operations=["*","-","+"]
+op=np.random.choice(operations)
+
+expression=str(a)+op+str(b)
+correct_answer=eval(expression)
+given_answer=int(input(str(a)+" "+op+" "+str(b)+" = "))
+
+if given_answer==correct_answer:
+    print("Correct")
+else:
+    print("Incorrect")
+    print("correct answer is :" ,correct_answer)
+
+#2 * 1 = 8
+#Incorrect
+#correct answer is : 2
+#or
+#3 * 2 = 6
+#Correct
+```
+## Conclusion
+The eval function is a powerful tool in Python that allows for dynamic evaluation of expressions.
--- a/contrib/advanced-python/index.md
+++ b/contrib/advanced-python/index.md
@ -11,3 +11,4 @@
 - [Exception Handling in Python](exception-handling.md)
 - [Generators](generators.md)
 - [List Comprehension](list-comprehension.md)
+- [Eval Function](eval_function.md)
--- a/contrib/machine-learning/assets/transformer-architecture.png
+++ b/contrib/machine-learning/assets/transformer-architecture.png
--- a/contrib/machine-learning/ensemble-learning.md
+++ b/contrib/machine-learning/ensemble-learning.md
@ -0,0 +1,140 @@
+# Ensemble Learning
+
+Ensemble Learning is a powerful machine learning paradigm that combines multiple models to achieve better performance than any individual model. The idea is to leverage the strengths of different models to improve overall accuracy, robustness, and generalization.
+
+
+
+## Introduction
+
+Ensemble Learning is a technique that combines the predictions from multiple machine learning models to make more accurate and robust predictions than a single model. It leverages the diversity of different models to reduce errors and improve performance.
+
+## Types of Ensemble Learning
+
+### Bagging
+
+Bagging, or Bootstrap Aggregating, involves training multiple versions of the same model on different subsets of the training data and averaging their predictions. The most common example of bagging is the `RandomForest` algorithm.
+
+### Boosting
+
+Boosting focuses on training models sequentially, where each new model corrects the errors made by the previous ones. This way, the ensemble learns from its mistakes, leading to improved performance. `AdaBoost` and `Gradient Boosting` are popular examples of boosting algorithms.
+
+### Stacking
+
+Stacking involves training multiple models (the base learners) and a meta-model that combines their predictions. The base learners are trained on the original dataset, while the meta-model is trained on the outputs of the base learners. This approach allows leveraging the strengths of different models.
+
+## Advantages and Disadvantages
+
+### Advantages
+
+- **Improved Accuracy**: Combines the strengths of multiple models.
+- **Robustness**: Reduces the risk of overfitting and model bias.
+- **Versatility**: Can be applied to various machine learning tasks, including classification and regression.
+
+### Disadvantages
+
+- **Complexity**: More complex than individual models, making interpretation harder.
+- **Computational Cost**: Requires more computational resources and training time.
+- **Implementation**: Can be challenging to implement and tune effectively.
+
+## Key Concepts
+
+- **Diversity**: The models in the ensemble should be diverse to benefit from their different strengths.
+- **Voting/Averaging**: For classification, majority voting is used to combine predictions. For regression, averaging is used.
+- **Weighting**: In some ensembles, models are weighted based on their accuracy or other metrics.
+
+## Code Examples
+
+### Bagging with Random Forest
+
+Below is an example of using Random Forest for classification on the Iris dataset.
+
+```python
+import numpy as np
+import pandas as pd
+from sklearn.datasets import load_iris
+from sklearn.ensemble import RandomForestClassifier
+from sklearn.model_selection import train_test_split
+from sklearn.metrics import accuracy_score, classification_report
+
+# Load dataset
+iris = load_iris()
+X, y = iris.data, iris.target
+
+# Split dataset
+X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
+
+# Initialize Random Forest model
+clf = RandomForestClassifier(n_estimators=100, random_state=42)
+
+# Train the model
+clf.fit(X_train, y_train)
+
+# Make predictions
+y_pred = clf.predict(X_test)
+
+# Evaluate the model
+accuracy = accuracy_score(y_test, y_pred)
+print(f"Accuracy: {accuracy * 100:.2f}%")
+print("Classification Report:\n", classification_report(y_test, y_pred))
+```
+
+### Boosting with AdaBoost
+Below is an example of using AdaBoost for classification on the Iris dataset.
+
+```
+from sklearn.ensemble import AdaBoostClassifier
+from sklearn.tree import DecisionTreeClassifier
+
+# Initialize base model
+base_model = DecisionTreeClassifier(max_depth=1)
+
+# Initialize AdaBoost model
+ada_clf = AdaBoostClassifier(base_estimator=base_model, n_estimators=50, random_state=42)
+
+# Train the model
+ada_clf.fit(X_train, y_train)
+
+# Make predictions
+y_pred = ada_clf.predict(X_test)
+
+# Evaluate the model
+accuracy = accuracy_score(y_test, y_pred)
+print(f"Accuracy: {accuracy * 100:.2f}%")
+print("Classification Report:\n", classification_report(y_test, y_pred))
+```
+
+### Stacking with Multiple Models
+Below is an example of using stacking with multiple models for classification on the Iris dataset.
+
+```
+from sklearn.linear_model import LogisticRegression
+from sklearn.neighbors import KNeighborsClassifier
+from sklearn.svm import SVC
+from sklearn.ensemble import StackingClassifier
+
+# Define base models
+base_models = [
+    ('knn', KNeighborsClassifier(n_neighbors=5)),
+    ('svc', SVC(kernel='linear', probability=True))
+]
+
+# Define meta-model
+meta_model = LogisticRegression()
+
+# Initialize Stacking model
+stacking_clf = StackingClassifier(estimators=base_models, final_estimator=meta_model, cv=5)
+
+# Train the model
+stacking_clf.fit(X_train, y_train)
+
+# Make predictions
+y_pred = stacking_clf.predict(X_test)
+
+# Evaluate the model
+accuracy = accuracy_score(y_test, y_pred)
+print(f"Accuracy: {accuracy * 100:.2f}%")
+print("Classification Report:\n", classification_report(y_test, y_pred))
+```
+
+## Conclusion
+Ensemble Learning is a powerful technique that combines multiple models to improve overall performance. By leveraging the strengths of different models, it provides better accuracy, robustness, and generalization. However, it comes with increased complexity and computational cost. Understanding and implementing ensemble methods can significantly enhance machine learning solutions.
--- a/contrib/machine-learning/index.md
+++ b/contrib/machine-learning/index.md
@ -11,9 +11,12 @@
 - [Introduction To Convolutional Neural Networks (CNNs)](intro-to-cnn.md)
 - [TensorFlow.md](tensorflow.md)
 - [PyTorch.md](pytorch.md)
+- [Ensemble Learning](ensemble-learning.md)
 - [Types of optimizers](types-of-optimizers.md)
 - [Logistic Regression](logistic-regression.md)
 - [Types_of_Cost_Functions](cost-functions.md)
 - [Clustering](clustering.md)
 - [Hierarchical Clustering](hierarchical-clustering.md)
 - [Grid Search](grid-search.md)
+- [Transformers](transformers.md)
+- [K-nearest neighbor (KNN)](knn.md)
--- a/contrib/machine-learning/knn.md
+++ b/contrib/machine-learning/knn.md
@ -0,0 +1,122 @@
+# K-Nearest Neighbors (KNN) Machine Learning Algorithm in Python
+
+## Introduction
+K-Nearest Neighbors (KNN) is a simple, yet powerful, supervised machine learning algorithm used for both classification and regression tasks. It assumes that similar things exist in close proximity. In other words, similar data points are near to each other.
+
+## How KNN Works
+KNN works by finding the distances between a query and all the examples in the data, selecting the specified number of examples (K) closest to the query, then voting for the most frequent label (in classification) or averaging the labels (in regression).
+
+### Steps:
+1. **Choose the number K of neighbors**
+2. **Calculate the distance** between the query-instance and all the training samples
+3. **Sort the distances** and determine the nearest neighbors based on the K-th minimum distance
+4. **Gather the labels** of the nearest neighbors
+5. **Vote for the most frequent label** (in case of classification) or **average the labels** (in case of regression)
+
+## When to Use KNN
+### Advantages:
+- **Simple and easy to understand:** KNN is intuitive and easy to implement.
+- **No training phase:** KNN is a lazy learner, meaning there is no explicit training phase.
+- **Effective with a small dataset:** KNN performs well with a small number of input variables.
+
+### Disadvantages:
+- **Computationally expensive:** The algorithm becomes significantly slower as the number of examples and/or predictors/independent variables increase.
+- **Sensitive to irrelevant features:** All features contribute to the distance equally.
+- **Memory-intensive:** Storing all the training data can be costly.
+
+### Use Cases:
+- **Recommender Systems:** Suggest items based on similarity to user preferences.
+- **Image Recognition:** Classify images by comparing new images to the training set.
+- **Finance:** Predict credit risk or fraud detection based on historical data.
+
+## KNN in Python
+
+### Required Libraries
+To implement KNN, we need the following Python libraries:
+- `numpy`
+- `pandas`
+- `scikit-learn`
+- `matplotlib` (for visualization)
+
+### Installation
+```bash
+pip install numpy pandas scikit-learn matplotlib
+```
+
+### Example Code
+Let's implement a simple KNN classifier using the Iris dataset.
+
+#### Step 1: Import Libraries
+```python
+import numpy as np
+import pandas as pd
+from sklearn.model_selection import train_test_split
+from sklearn.neighbors import KNeighborsClassifier
+from sklearn.metrics import accuracy_score
+import matplotlib.pyplot as plt
+```
+
+#### Step 2: Load Dataset
+```python
+from sklearn.datasets import load_iris
+iris = load_iris()
+X = iris.data
+y = iris.target
+```
+
+#### Step 3: Split Dataset
+```python
+X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
+```
+
+#### Step 4: Train KNN Model
+```python
+knn = KNeighborsClassifier(n_neighbors=3)
+knn.fit(X_train, y_train)
+```
+
+#### Step 5: Make Predictions
+```python
+y_pred = knn.predict(X_test)
+```
+
+#### Step 6: Evaluate the Model
+```python
+accuracy = accuracy_score(y_test, y_pred)
+print(f'Accuracy: {accuracy}')
+```
+
+### Visualization (Optional)
+```python
+# Plotting the decision boundary for visualization (for 2D data)
+h = .02  # step size in the mesh
+# Create color maps
+cmap_light = plt.cm.RdYlBu
+cmap_bold = plt.cm.RdYlBu
+
+# For simplicity, we take only the first two features of the dataset
+X_plot = X[:, :2]
+x_min, x_max = X_plot[:, 0].min() - 1, X_plot[:, 0].max() + 1
+y_min, y_max = X_plot[:, 1].min() - 1, y_plot[:, 1].max() + 1
+xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
+                     np.arange(y_min, y_max, h))
+
+Z = knn.predict(np.c_[xx.ravel(), yy.ravel()])
+Z = Z.reshape(xx.shape)
+plt.figure()
+plt.pcolormesh(xx, yy, Z, cmap=cmap_light)
+
+# Plot also the training points
+plt.scatter(X_plot[:, 0], X_plot[:, 1], c=y, edgecolor='k', cmap=cmap_bold)
+plt.xlim(xx.min(), xx.max())
+plt.ylim(yy.min(), yy.max())
+plt.title("3-Class classification (k = 3)")
+plt.show()
+```
+
+## Generalization and Considerations
+- **Choosing K:** The choice of K is critical. Smaller values of K can lead to noisy models, while larger values make the algorithm computationally expensive and might oversimplify the model.
+- **Feature Scaling:** Since KNN relies on distance calculations, features should be scaled (standardized or normalized) to ensure that all features contribute equally to the distance computation.
+- **Distance Metrics:** The choice of distance metric (Euclidean, Manhattan, etc.) can affect the performance of the algorithm.
+
+In conclusion, KNN is a versatile and easy-to-implement algorithm suitable for various classification and regression tasks, particularly when working with small datasets and well-defined features. However, careful consideration should be given to the choice of K, feature scaling, and distance metrics to optimize its performance.
--- a/contrib/machine-learning/transformers.md
+++ b/contrib/machine-learning/transformers.md
@ -0,0 +1,443 @@
+# Transformers
+## Introduction
+A transformer is a deep learning architecture developed by Google and based on the multi-head attention mechanism. It is based on the softmax-based attention 
+mechanism. Before transformers, predecessors of attention mechanism were added to gated recurrent neural networks, such as LSTMs and gated recurrent units (GRUs), which processed datasets sequentially. Dependency on previous token computations prevented them from being able to parallelize the attention mechanism.
+
+Transformers are a revolutionary approach to natural language processing (NLP). Unlike older models, they excel at understanding long-range connections between words. This "attention" mechanism lets them grasp the context of a sentence, making them powerful for tasks like machine translation, text summarization, and question answering. Introduced in 2017, transformers are now the backbone of many large language models, including tools you might use every day. Their ability to handle complex relationships in language is fueling advancements in AI across various fields.
+
+## Model Architecture
+
+![Model Architecture](assets/transformer-architecture.png)
+
+Source: [Attention Is All You Need](https://arxiv.org/pdf/1706.03762)
+
+
+### Encoder
+The encoder is composed of a stack of identical layers. Each layer has two sub-layers. The first is a multi-head self-attention mechanism, and the second is a simple, positionwise fully connected feed-forward network. Each encoder consists of two major components: a self-attention mechanism and a feed-forward neural network. The self-attention mechanism accepts input encodings from the previous encoder and weights their relevance to each other to generate output encodings. The feed-forward neural network further processes each output encoding individually. These output encodings are then passed to the next encoder as its input, as well as to the decoders.
+
+### Decoder
+The decoder is also composed of a stack of identical layers. In addition to the two sub-layers in each encoder layer, the decoder inserts a third sub-layer, which performs multi-head attention over the output of the encoder stack. The decoder functions in a similar fashion to the encoder, but an additional attention mechanism is inserted which instead draws relevant information from the encodings generated by the encoders. This mechanism can also be called the encoder-decoder attention.
+
+### Attention
+#### Scaled Dot-Product Attention
+The input consists of queries and keys of dimension $d_k$ , and values of dimension $d_v$. We compute the dot products of the query with all keys, divide each by $\sqrt {d_k}$ , and apply a softmax function to obtain the weights on the values.
+
+$$Attention(Q, K, V) = softmax(\dfrac{QK^T}{\sqrt{d_k}}) \times V$$
+
+#### Multi-Head Attention
+Instead of performing a single attention function with $d_{model}$-dimensional keys, values and queries, it is beneficial to linearly project the queries, keys and values h times with different, learned linear projections to $d_k$ , $d_k$ and $d_v$ dimensions, respectively. 
+
+Multi-head attention allows the model to jointly attend to information from different representation
+subspaces at different positions. With a single attention head, averaging inhibits this.
+
+$$MultiHead(Q, K, V) = Concat(head_1, _{...}, head_h) \times W^O$$
+
+where,
+
+$$head_i = Attention(QW_i^Q, KW_i^K, VW_i^V)$$
+
+where the projections are parameter matrices.
+
+#### Masked Attention
+It may be necessary to cut out attention links between some word-pairs. For example, the decoder for token position 
+$t$ should not have access to token position $t+1$.
+
+$$MaskedAttention(Q, K, V) = softmax(M + \dfrac{QK^T}{\sqrt{d_k}}) \times V$$
+
+### Feed-Forward Network
+Each of the layers in the encoder and decoder contains a fully connected feed-forward network, which is applied to each position separately and identically. This
+consists of two linear transformations with a ReLU activation in between.
+
+$$FFN(x) = max(0, xW_1 + b_1)W_2 + b_2$$
+
+### Positional Encoding
+A positional encoding is a fixed-size vector representation that encapsulates the relative positions of tokens within a target sequence: it provides the transformer model with information about where the words are in the input sequence.
+
+The sine and cosine functions of different frequencies:
+
+$$PE<sub>(pos,2i)</sub> = \sin({\dfrac{pos}{10000^{\dfrac{2i}{d_{model}}}}})$$
+
+$$PE<sub>(pos,2i)</sub> = \cos({\dfrac{pos}{10000^{\dfrac{2i}{d_{model}}}}})$$
+
+## Implementation
+### Theory
+Text is converted to numerical representations called tokens, and each token is converted into a vector via looking up from a word embedding table. 
+At each layer, each token is then contextualized within the scope of the context window with other tokens via a parallel multi-head attention mechanism 
+allowing the signal for key tokens to be amplified and less important tokens to be diminished. 
+
+The transformer uses an encoder-decoder architecture. The encoder extracts features from an input sentence, and the decoder uses the features to produce an output sentence. Some architectures use full encoders and decoders, autoregressive encoders and decoders, or combination of both. This depends on the usage and context of the input.
+
+### Tensorflow
+TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks. It was developed by the Google Brain team for Google's internal use in research and production.
+
+Tensorflow provides the transformer encoder and decoder block that can be implemented by the specification of the user. Although, the transformer is not provided as a standalone to be imported and executed, the user has to create the model first. They also have a tutorial on how to implement the transformer from scratch for machine translation and can be found [here](https://www.tensorflow.org/text/tutorials/transformer).
+
+More information on [encoder](https://www.tensorflow.org/api_docs/python/tfm/nlp/layers/TransformerEncoderBlock) and [decoder](https://www.tensorflow.org/api_docs/python/tfm/nlp/layers/TransformerDecoderBlock) block mentioned in the code.
+
+Imports:
+```python
+import tensorflow as tf
+import tensorflow_models as tfm
+```
+
+Adding word embeddings and positional encoding:
+```python
+class PositionalEmbedding(tf.keras.layers.Layer):
+    def __init__(self, vocab_size, d_model):
+        super().__init__()
+        self.d_model = d_model
+        self.embedding = tf.keras.layers.Embedding(vocab_size, d_model, mask_zero=True) 
+        self.pos_encoding = tfm.nlp.layers.RelativePositionEmbedding(hidden_size=d_model)
+  
+    def compute_mask(self, *args, **kwargs):
+        return self.embedding.compute_mask(*args, **kwargs)
+  
+    def call(self, x):
+        length = tf.shape(x)[1]
+        x = self.embedding(x)
+        x = x + self.pos_encoding[tf.newaxis, :length, :]
+        return x
+```
+
+Creating the encoder for the transformer:
+```python
+class Encoder(tf.keras.layers.Layer):
+    def __init__(self, num_layers, d_model, num_heads,
+                 dff, vocab_size, dropout_rate=0.1):
+        super().__init__()
+    
+        self.d_model = d_model
+        self.num_layers = num_layers
+    
+        self.pos_embedding = PositionalEmbedding(
+            vocab_size=vocab_size, d_model=d_model)
+    
+        self.enc_layers = [
+            tfm.nlp.layers.TransformerEncoderBlock(output_last_dim=d_model,
+                         num_attention_heads=num_heads,
+                         inner_dim=dff,
+                         inner_activation="relu",
+                         inner_dropout=dropout_rate)
+            for _ in range(num_layers)]
+        self.dropout = tf.keras.layers.Dropout(dropout_rate)
+
+    def call(self, x):
+        x = self.pos_embedding(x, length=2048)
+        x = self.dropout(x)
+    
+        for i in range(self.num_layers):
+          x = self.enc_layers[i](x)
+    
+        return x
+```
+
+Creating the decoder for the transformer:
+```python
+class Decoder(tf.keras.layers.Layer):
+    def __init__(self, num_layers, d_model, num_heads, dff, vocab_size,
+                 dropout_rate=0.1):
+        super(Decoder, self).__init__()
+    
+        self.d_model = d_model
+        self.num_layers = num_layers
+    
+        self.pos_embedding = PositionalEmbedding(vocab_size=vocab_size,
+                                                 d_model=d_model)
+        self.dropout = tf.keras.layers.Dropout(dropout_rate)
+        self.dec_layers = [
+            tfm.nlp.layers.TransformerDecoderBlock(num_attention_heads=num_heads,
+                         intermediate_size=dff,
+                         intermediate_activation="relu",
+                         dropout_rate=dropout_rate)
+            for _ in range(num_layers)]
+  
+    def call(self, x, context):
+        x = self.pos_embedding(x)
+        x = self.dropout(x)
+    
+        for i in range(self.num_layers):
+            x  = self.dec_layers[i](x, context)
+  
+        return x
+```
+
+Combining the encoder and decoder to create the transformer:
+```python
+class Transformer(tf.keras.Model):
+    def __init__(self, num_layers, d_model, num_heads, dff,
+                 input_vocab_size, target_vocab_size, dropout_rate=0.1):
+        super().__init__()
+        self.encoder = Encoder(num_layers=num_layers, d_model=d_model,
+                               num_heads=num_heads, dff=dff,
+                               vocab_size=input_vocab_size,
+                               dropout_rate=dropout_rate)
+      
+        self.decoder = Decoder(num_layers=num_layers, d_model=d_model,
+                               num_heads=num_heads, dff=dff,
+                               vocab_size=target_vocab_size,
+                               dropout_rate=dropout_rate)
+      
+        self.final_layer = tf.keras.layers.Dense(target_vocab_size)
+
+    def call(self, inputs):
+        context, x  = inputs
+    
+        context = self.encoder(context)
+        x = self.decoder(x, context)
+        logits = self.final_layer(x)
+    
+        return logits
+```
+
+Model initialization that be used for training and inference:
+```python
+transformer = Transformer(
+      num_layers=num_layers,
+      d_model=d_model,
+      num_heads=num_heads,
+      dff=dff,
+      input_vocab_size=64,
+      target_vocab_size=64,
+      dropout_rate=dropout_rate
+)
+```
+
+Sample:
+```python
+src = tf.random.uniform((64, 40))
+tgt = tf.random.uniform((64, 50))
+
+output = transformer((src, tgt))
+```
+
+O/P:
+```
+<tf.Tensor: shape=(64, 50, 64), dtype=float32, numpy=
+array([[[ 0.78274703, -1.2312567 ,  0.7272992 , ...,  2.1805947 ,
+          1.3511044 , -1.275499  ],
+        [ 0.82658154, -1.2863302 ,  0.76494133, ...,  2.39311   ,
+          1.0973787 , -1.3414565 ],
+        [ 0.57013685, -1.3958443 ,  1.0213287 , ...,  2.3791933 ,
+          0.58439416, -0.93464035],
+        ...,
+        [ 0.82214123, -0.51090807,  0.25897795, ...,  2.1979148 ,
+          1.4126635 , -0.5771998 ],
+        [ 0.6371507 , -0.36584622,  0.40954843, ...,  2.0241373 ,
+          1.6503414 , -0.74359566],
+        [ 0.6739802 , -0.39973688,  0.3338765 , ...,  1.6819229 ,
+          1.7505672 , -1.0763712 ]],
+
+       [[ 0.78274703, -1.2312567 ,  0.7272992 , ...,  2.1805947 ,
+          1.3511044 , -1.275499  ],
+        [ 0.82658154, -1.2863302 ,  0.76494133, ...,  2.39311   ,
+          1.0973787 , -1.3414565 ],
+        [ 0.57013685, -1.3958443 ,  1.0213287 , ...,  2.3791933 ,
+          0.58439416, -0.93464035],
+        ...,
+        [ 0.82214123, -0.51090807,  0.25897795, ...,  2.1979148 ,
+          1.4126635 , -0.5771998 ],
+        [ 0.6371507 , -0.36584622,  0.40954843, ...,  2.0241373 ,
+          1.6503414 , -0.74359566],
+        [ 0.6739802 , -0.39973688,  0.3338765 , ...,  1.6819229 ,
+          1.7505672 , -1.0763712 ]],
+
+       [[ 0.78274703, -1.2312567 ,  0.7272992 , ...,  2.1805947 ,
+          1.3511044 , -1.275499  ],
+        [ 0.82658154, -1.2863302 ,  0.76494133, ...,  2.39311   ,
+          1.0973787 , -1.3414565 ],
+        [ 0.57013685, -1.3958443 ,  1.0213287 , ...,  2.3791933 ,
+          0.58439416, -0.93464035],
+        ...,
+        [ 0.82214123, -0.51090807,  0.25897795, ...,  2.1979148 ,
+          1.4126635 , -0.5771998 ],
+        [ 0.6371507 , -0.36584622,  0.40954843, ...,  2.0241373 ,
+          1.6503414 , -0.74359566],
+        [ 0.6739802 , -0.39973688,  0.3338765 , ...,  1.6819229 ,
+          1.7505672 , -1.0763712 ]],
+
+       ...,
+
+       [[ 0.78274703, -1.2312567 ,  0.7272992 , ...,  2.1805947 ,
+          1.3511044 , -1.275499  ],
+        [ 0.82658154, -1.2863302 ,  0.76494133, ...,  2.39311   ,
+          1.0973787 , -1.3414565 ],
+        [ 0.57013685, -1.3958443 ,  1.0213287 , ...,  2.3791933 ,
+          0.58439416, -0.93464035],
+        ...,
+        [ 0.82214123, -0.51090807,  0.25897795, ...,  2.1979148 ,
+          1.4126635 , -0.5771998 ],
+        [ 0.6371507 , -0.36584622,  0.40954843, ...,  2.0241373 ,
+          1.6503414 , -0.74359566],
+        [ 0.6739802 , -0.39973688,  0.3338765 , ...,  1.6819229 ,
+          1.7505672 , -1.0763712 ]],
+
+       [[ 0.78274703, -1.2312567 ,  0.7272992 , ...,  2.1805947 ,
+          1.3511044 , -1.275499  ],
+        [ 0.82658154, -1.2863302 ,  0.76494133, ...,  2.39311   ,
+          1.0973787 , -1.3414565 ],
+        [ 0.57013685, -1.3958443 ,  1.0213287 , ...,  2.3791933 ,
+          0.58439416, -0.93464035],
+        ...,
+        [ 0.82214123, -0.51090807,  0.25897795, ...,  2.1979148 ,
+          1.4126635 , -0.5771998 ],
+        [ 0.6371507 , -0.36584622,  0.40954843, ...,  2.0241373 ,
+          1.6503414 , -0.74359566],
+        [ 0.6739802 , -0.39973688,  0.3338765 , ...,  1.6819229 ,
+          1.7505672 , -1.0763712 ]],
+
+       [[ 0.78274703, -1.2312567 ,  0.7272992 , ...,  2.1805947 ,
+          1.3511044 , -1.275499  ],
+        [ 0.82658154, -1.2863302 ,  0.76494133, ...,  2.39311   ,
+          1.0973787 , -1.3414565 ],
+        [ 0.57013685, -1.3958443 ,  1.0213287 , ...,  2.3791933 ,
+          0.58439416, -0.93464035],
+        ...,
+        [ 0.82214123, -0.51090807,  0.25897795, ...,  2.1979148 ,
+          1.4126635 , -0.5771998 ],
+        [ 0.6371507 , -0.36584622,  0.40954843, ...,  2.0241373 ,
+          1.6503414 , -0.74359566],
+        [ 0.6739802 , -0.39973688,  0.3338765 , ...,  1.6819229 ,
+          1.7505672 , -1.0763712 ]]], dtype=float32)>
+```
+```
+>>> output.shape
+TensorShape([64, 50, 64])
+```
+
+### PyTorch
+PyTorch is a machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella. 
+
+Unlike Tensorflow, PyTorch provides the full implementation of the transformer model that can be executed on the go. More information can be found [here](https://pytorch.org/docs/stable/_modules/torch/nn/modules/transformer.html#Transformer). A full implementation of the model can be found [here](https://github.com/pytorch/examples/tree/master/word_language_model).
+
+Imports:
+```python
+import torch
+import torch.nn as nn
+```
+
+Initializing the model:
+```python
+transformer = nn.Transformer(nhead=16, num_encoder_layers=8)
+```
+
+Sample:
+```python
+src = torch.rand((10, 32, 512))
+tgt = torch.rand((20, 32, 512))
+
+output = transformer(src, tgt)
+```
+
+O/P:
+```
+tensor([[[ 0.2938, -0.4824, -0.7816,  ...,  0.0742,  0.5162,  0.3632],
+         [-0.0786, -0.5241,  0.6384,  ...,  0.3462, -0.0618,  0.9943],
+         [ 0.7827,  0.1067, -0.1637,  ..., -1.7730, -0.3322, -0.0029],
+         ...,
+         [-0.3202,  0.2341, -0.0896,  ..., -0.9714, -0.1251, -0.0711],
+         [-0.1663, -0.5047, -0.0404,  ..., -0.9339,  0.3963,  0.1018],
+         [ 1.2834, -0.4400,  0.0486,  ..., -0.6876, -0.4752,  0.0180]],
+
+        [[ 0.9869, -0.7384, -1.0704,  ..., -0.9417,  1.3279, -0.1665],
+         [ 0.3445, -0.2454, -0.3644,  ..., -0.4856, -1.1004, -0.6819],
+         [ 0.7568, -0.3151, -0.5034,  ..., -1.2081, -0.7119,  0.3775],
+         ...,
+         [-0.0451, -0.7596,  0.0168,  ..., -0.8267, -0.3272,  1.0457],
+         [ 0.3150, -0.6588, -0.1840,  ...,  0.1822, -0.0653,  0.9053],
+         [ 0.8692, -0.3519,  0.3128,  ..., -1.8446, -0.2325, -0.8662]],
+
+        [[ 0.9719, -0.3113,  0.4637,  ..., -0.4422,  1.2348,  0.8274],
+         [ 0.3876, -0.9529, -0.7810,  ..., -0.5843, -1.1439, -0.3366],
+         [-0.5774,  0.3789, -0.2819,  ..., -1.4057,  0.4352,  0.1474],
+         ...,
+         [ 0.6899, -0.1146, -0.3297,  ..., -1.7059, -0.1750,  0.4203],
+         [ 0.3689, -0.5174, -0.1253,  ...,  0.1417,  0.4159,  0.7560],
+         [ 0.5024, -0.7996,  0.1592,  ..., -0.8344, -1.1125,  0.4736]],
+
+        ...,
+
+        [[ 0.0704, -0.3971, -0.2768,  ..., -1.9929,  0.8608,  1.2264],
+         [ 0.4013, -0.0962, -0.0965,  ..., -0.4452, -0.8682, -0.4593],
+         [ 0.1656,  0.5224, -0.1723,  ..., -1.5785,  0.3219,  1.1507],
+         ...,
+         [-0.9443,  0.4653,  0.2936,  ..., -0.9840, -0.0142, -0.1595],
+         [-0.6544, -0.3294, -0.0803,  ...,  0.1623, -0.5061,  0.9824],
+         [-0.0978, -1.0023, -0.6915,  ..., -0.2296, -0.0594, -0.4715]],
+
+        [[ 0.6531, -0.9285, -0.0331,  ..., -1.1481,  0.7768, -0.7321],
+         [ 0.3325, -0.6683, -0.6083,  ..., -0.4501,  0.2289,  0.3573],
+         [-0.6750,  0.4600, -0.8512,  ..., -2.0097, -0.5159,  0.2773],
+         ...,
+         [-1.4356, -1.0135,  0.0081,  ..., -1.2985, -0.3715, -0.2678],
+         [ 0.0546, -0.2111, -0.0965,  ..., -0.3822, -0.4612,  1.6217],
+         [ 0.7700, -0.5309, -0.1754,  ..., -2.2807, -0.0320, -1.5551]],
+
+        [[ 0.2399, -0.9659,  0.1086,  ..., -1.1756,  0.4063,  0.0615],
+         [-0.2202, -0.7972, -0.5024,  ..., -0.9126, -1.5248,  0.2418],
+         [ 0.5215,  0.4540,  0.0036,  ..., -0.2135,  0.2145,  0.6638],
+         ...,
+         [-0.2190, -0.4967,  0.7149,  ..., -0.3324,  0.3502,  1.0624],
+         [-0.0108, -0.9205, -0.1315,  ..., -1.0153,  0.2989,  1.1415],
+         [ 1.1284, -0.6560,  0.6755,  ..., -1.2157,  0.8580, -0.5022]]],
+       grad_fn=<NativeLayerNormBackward0>)
+```
+```
+>> output.shape
+torch.Size([20, 32, 512])
+```
+
+### HuggingFace
+Hugging Face, Inc. is a French-American company incorporated under the Delaware General Corporation Law and based in New York City that develops computation tools for building applications using machine learning.
+
+It has a wide-range of models that can implemented in Tensorflow, PyTorch and other development backends as well. The models are already trained on a dataset and can be pretrained on custom dataset for customized use, according to the user. The information for training the model and loading the pretrained model can be found [here](https://huggingface.co/docs/transformers/en/training).
+
+In HuggingFace, `pipeline` is used to run inference from the trained model available in the Hub. This is very beginner friendly. The model is downloaded to the local system on running the script before running the inference. It has to be made sure that the model downloaded does not exceed your available data plan.
+
+Imports:
+```python
+from transformers import pipeline
+```
+
+Initialization:
+
+The model used here is BART (large) which was trained on MultiNLI dataset, which consist of sentence paired with its textual entailment.
+```python
+classifier = pipeline(model="facebook/bart-large-mnli")
+```
+
+Sample:
+
+The first argument is the sentence which needs to be analyzed. The second argument, `candidate_labels`, is the list of labels which most likely the first argument sentence belongs to. The output dictionary will have a key as `score`, where the highest index is the textual entailment of the sentence with the index of the label in the list.
+
+```python
+output = classifier(
+    "I need to leave but later",
+    candidate_labels=["urgent", "not urgent", "sleep"],
+)
+```
+
+O/P:
+
+```
+{'sequence': 'I need to leave but later',
+ 'labels': ['not urgent', 'urgent', 'sleep'],
+ 'scores': [0.8889380097389221, 0.10631518065929413, 0.00474683940410614]}
+```
+
+## Application
+The transformer has had great success in natural language processing (NLP). Many large language models such as GPT-2, GPT-3, GPT-4, Claude, BERT, XLNet, RoBERTa and ChatGPT demonstrate the ability of transformers to perform a wide variety of such NLP-related tasks, and have the potential to find real-world applications.
+
+These may include:
+- Machine translation
+- Document summarization
+- Text generation
+- Biological sequence analysis
+- Computer code generation
+
+## Bibliography
+- [Attention Is All You Need](https://arxiv.org/pdf/1706.03762)
+- [Tensorflow Tutorial](https://www.tensorflow.org/text/tutorials/transformer)
+- [Tensorflow Models Docs](https://www.tensorflow.org/api_docs/python/tfm/nlp/layers)
+- [Wikipedia](https://en.wikipedia.org/wiki/Transformer_(deep_learning_architecture))
+- [HuggingFace](https://huggingface.co/docs/transformers/en/index)
+- [PyTorch](https://pytorch.org/docs/stable/generated/torch.nn.Transformer.html)