picoGPT is an unnecessarily tiny and minimal implementation of GPT-2 in plain NumPy.
Go to file
Ondřej Čertík bf9d37ddef Convert the output to a list of ints
In Jax, if you do list(np.array([1, 2, 3])) it becomes a list of 0-d
arrays for each integer. We avoid this by explicitly casting each
element to int. This patch works with both NumPy and Jax.

Now gpt2.py successfully runs with both Jax and NumPy.
2023-02-16 13:27:29 -07:00
.gitignore
LICENSE Add LICENSE 2023-02-09 20:13:10 -05:00
README.md Update README.md 2023-02-10 16:26:46 -05:00
encoder.py
gpt2.py Convert the output to a list of ints 2023-02-16 13:27:29 -07:00
gpt2_pico.py I can't spell. 2023-02-09 18:53:26 -05:00
requirements.txt Refactor. 2023-01-31 01:27:09 +01:00
utils.py Refactor. 2023-01-31 01:27:09 +01:00

README.md

PicoGPT

You've seen openai/gpt-2.

You've seen karpathy/minGPT.

You've even seen karpathy/nanoGPT!

But have you seen picoGPT??!?

picoGPT is an unnecessarily tiny and minimal implementation of GPT-2 in plain NumPy. The entire forward pass code is 40 lines of code. I wrote a related blog post for picoGPT.

picoGPT features:

  • Fast? Nah, picoGPT is megaSLOW 🐌
  • Training code? Error, 404 not found
  • Batch inference? picoGPT is civilized, single file line, one at a time only
  • top-p sampling? top-k? temperature? categorical sampling?! greedy?
  • Readable? gpt2.py gpt2_pico.py
  • Smol??? YESS!!! TEENIE TINY in fact 🤏

A quick breakdown of each of the files:

  • encoder.py contains the code for OpenAI's BPE Tokenizer, taken straight from their gpt-2 repo.
  • utils.py contains the code to download and load the GPT-2 model weights, tokenizer, and hyper-parameters.
  • gpt2.py contains the actual GPT model and generation code which we can run as a python script.
  • gpt2_pico.py is the same as gpt2.py, but in even fewer lines of code. Why? Because why not 😎👍.

Dependencies

pip install -r requirements.txt

If you're using an M1 Macbook, you'll need to replace tensorflow with tensorflow-macos.

Tested on Python 3.9.10.

Usage

python gpt2.py "Alan Turing theorized that computers would one day become"

Which generates

 the most powerful machines on the planet.

The computer is a machine that can perform complex calculations, and it can perform these calculations in a way that is very similar to the human brain.

You can also control the number of tokens to generate, the model size (one of ["124M", "355M", "774M", "1558M"]), and the directory to save the models:

python gpt2.py \
    "Alan Turing theorized that computers would one day become" \
    --n_tokens_to_generate 40 \
    --model_size "124M" \
    --models_dir "models"