diff --git a/contrib/machine-learning/Transformers.md b/contrib/machine-learning/Transformers.md
index e69de29..7bcc102 100644
--- a/contrib/machine-learning/Transformers.md
+++ b/contrib/machine-learning/Transformers.md
@@ -0,0 +1,21 @@
+# Transformers
+## Introduction
+A transformer is a deep learning architecture developed by Google and based on the multi-head attention mechanism. It is based on the softmax-based attention 
+mechanism. Before transformers, predecessors of attention mechanism were added to gated recurrent neural networks, such as LSTMs and gated recurrent units (GRUs), which 
+processed datasets sequentially. Dependency on previous token computations prevented them from being able to parallelize the attention mechanism.
+
+## Key Concepts
+
+## Architecture
+
+## Implementation
+### Theory
+Text is converted to numerical representations called tokens, and each token is converted into a vector via looking up from a word embedding table. 
+At each layer, each token is then contextualized within the scope of the context window with other tokens via a parallel multi-head attention mechanism 
+allowing the signal for key tokens to be amplified and less important tokens to be diminished.
+
+### HuggingFace
+
+### Tensorflow and Keras
+
+### PyTorch