GPT 10B+ params (8 GPUs)
Tutorial shows how to use Runs to train large models using PyTorch Lightning and Grid

GPT

In this tutorial, we'll train the Min-GPT by Andrej Karpathy across 8 GPUs. This model uses PyTorch Lightning + Deepspeed to scale the model up.
This model was adapted by Sean Naren to use PyTorch Lightning + Deepspeed.
Time: 2 minutes

Run via the UI

The tutorial is extremely simple...
    1.
    Find the path of the file we want to train (https://github.com/SeanNaren/minGPT/blob/stage3/train.py)
    2.
    Paste it into the run dialog on the UI
    1.
    Choose a machine with 8 GPUs (and make sure you are using all 8 GPUs per experiment)
    1.
    Paste the script arguments
1
--n_layer 15 \
2
--n_head 16 \
3
--n_embd 3072 \
4
--gpus 8 \
5
--precision 16 \
6
--batch_size 1
Copied!

Run via the CLI

Make sure you have the grid CLI installed and you've logged in.
1
pip install lightning-grid
2
grid login
Copied!
First, clone the repo
1
git clone https://github.com/SeanNaren/minGPT.git
2
cd minGPT
Copied!
For 1.7 B params, run this command:
1
grid run \
2
--instance_type 8_K80_12gb \
3
--gpus 8 \
4
train.py \
5
--n_layer 15 \
6
--n_head 16 \
7
--n_embd 3072 \
8
--gpus 8 \
9
--precision 16 \
10
--batch_size 1
Copied!

1.7B params

1
grid run \
2
--instance_type 8_K80_12gb \
3
--gpus 8 \
4
train.py \
5
--n_layer 15 \
6
--n_head 16 \
7
--n_embd 3072 \
8
--gpus 8 \
9
--precision 16 \
10
--batch_size 1
Copied!

10B params

1
grid run \
2
--instance_type 8_v100_32gb \
3
--gpus 8 \
4
train.py \
5
--n_layer 15 \
6
--n_head 16 \
7
--n_embd 3072 \
8
--gpus 8 \
9
--precision 16 \
10
--batch_size 1
Copied!

20B params

1
grid run \
2
--instance_type 8_v100_32gb \
3
--gpus 8 \
4
train.py \
5
--n_layer 25 \
6
--n_head 16 \
7
--n_embd 3072 \
8
--gpus 8 \
9
--precision 16 \
10
--batch_size 1
Copied!
Last modified 3mo ago