Running Experiments With a Dockerfile
Use your own Dockerfile files when creating experiments in Grid.
Grid supports the creation of Runs using Dockerfile files. Dockerfiles are a container specification that determines how images are be built. You can find documentation about Dockerfiles here.

Step 1: Create a Dockerfile

Here's an example repository:
1
.
2
├── Dockerfile
3
└── run.py # Script we want to run
Copied!
The Dockerfile must be valid in order for it to work. Here's an example:
1
# base image you want to use
2
# make sure to use a CUDA image if running on GPUs
3
# FROM nvidia/cuda:XX.X-cudnnX-devel-ubuntuXX.XX
4
FROM python:3.9.6-slim
5
6
# these two lines are mandatory
7
WORKDIR /gridai/project
8
COPY . .
9
10
# any RUN commands you'd like to run
11
# use this to install dependencies
12
RUN pip install pytorch-lightning && \
13
apt install curl -y
Copied!
Two lines are mandatory:
    WORKDIR /gridai/project : determines which WORKDIR to use. Grid expects your executable to be found in this directory.
    COPY . . : copies all your repository files into the image.
Everything else is up to you.

Step 2: Create a Run

You will need to create a run using the flag --dockerfile where you pass the location of your Dockerfile.
1
$ grid run --dockerfile Dockerfile --localdir run.py
2
⠙ Submitting Run divergent-piculet-508 ...
3
upload ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0%
4
⠦ Making query to Grid
5
Run submitted!
6
`grid status` to list all runs
7
`grid status divergent-piculet-508` to see all experiments for this run
8
9
----------------------
10
Submission summary
11
----------------------
12
script: run.py
13
instance_type: t2.medium
14
use_spot: False
15
cloud_provider: aws
16
cloud_credentials: cc-grv4f
17
grid_name: divergent-piculet-508
18
datastore_name: None
19
datastore_version: None
20
datastore_mount_dir: None
Copied!

Step 3 View Build Logs

Then you are able to follow both build and experiment logs with the CLI or web UI.
1
# shows last 10 lines
2
# experiment has run successfully
3
$ grid logs divergent-piculet-508-exp0 -l 10 --show-build-logs
4
5
⠋ Fetching logs ...GraphQL URL: https://staging.grid.ai/graphql
6
[build] [2021-06-30T21:55:58.139136+00:00] Stored in directory: /root/.cache/pip/wheels/2f/a0/d3/4030d9f80e6b3be787f19fc911b8e7aa462986a40ab1e4bb94
7
[build] [2021-06-30T21:55:58.142745+00:00] Successfully built future
8
[build] [2021-06-30T21:55:58.437950+00:00] Installing collected packages: urllib3, pyasn1, idna, chardet, certifi, six, rsa, requests, pyasn1-modules, oauthlib, multidict, cachetools, yarl, typing-extensions, requests-oauthlib, pyparsing, google-auth, attrs, async-timeout, werkzeug, torch, tensorboard-plugin-wit, protobuf, packaging, numpy, markdown, grpcio, google-auth-oauthlib, fsspec, aiohttp, absl-py, tqdm, torchmetrics, tensorboard, PyYAML, pyDeprecate, future, pytorch-lightning
9
[build] [2021-06-30T21:56:17.643380+00:00] Successfully installed PyYAML-5.4.1 absl-py-0.13.0 aiohttp-3.7.4.post0 async-timeout-3.0.1 attrs-21.2.0 cachetools-4.2.2 certifi-2021.5.30 chardet-4.0.0 fsspec-2021.6.1 future-0.18.2 google-auth-1.32.1 google-auth-oauthlib-0.4.4 grpcio-1.38.1 idna-2.10 markdown-3.3.4 multidict-5.1.0 numpy-1.21.0 oauthlib-3.1.1 packaging-20.9 protobuf-3.17.3 pyDeprecate-0.3.0 pyasn1-0.4.8 pyasn1-modules-0.2.8 pyparsing-2.4.7 pytorch-lightning-1.3.7.post0 requests-2.25.1 requests-oauthlib-1.3.0 rsa-4.7.2 six-1.16.0 tensorboard-2.4.1 tensorboard-plugin-wit-1.8.0 torch-1.9.0 torchmetrics-0.4.0 tqdm-4.61.1 typing-extensions-3.10.0.0 urllib3-1.26.6 werkzeug-2.0.1 yarl-1.6.3
10
[build] [2021-06-30T21:56:17.643396+00:00] WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
11
[build] [2021-06-30T21:56:18.023729+00:00] INFO[0046] Taking snapshot of full filesystem...
12
[build] [2021-06-30T21:57:08.467108+00:00] INFO[0096] Pushing layer 302180240179.dkr.ecr.us-east-1.amazonaws.com/grid-cloud-staging:e8a29581-0515-43c9-b675-4d69fd192f32:a0beb2d8d5c91e8e9c636fbf169c1b09e4e4814adab8b4ab36531c8ec69c0bd0 to cache now
13
[build] [2021-06-30T21:57:08.467247+00:00] WARN[0096] error uploading layer to cache: getting tag for destination: repository can only contain the runes `abcdefghijklmnopqrstuvwxyz0123456789_-./`: grid-cloud-staging:e8a29581-0515-43c9-b675-4d69fd192f32
14
[build] [2021-06-30T21:57:08.729481+00:00] INFO[0097] Pushing image to 302180240179.dkr.ecr.us-east-1.amazonaws.com/grid-cloud-staging:e8a29581-0515-43c9-b675-4d69fd192f32
15
[build] [2021-06-30T21:58:40.454430+00:00] INFO[0188] Pushed image to 1 destinations
16
[experiment] [2021-06-30T22:12:07.252114+00:00] Loop 90
17
[experiment] [2021-06-30T22:12:07.252118+00:00] Loop 91
18
[experiment] [2021-06-30T22:12:07.252121+00:00] Loop 92
19
[experiment] [2021-06-30T22:12:07.252125+00:00] Loop 93
20
[experiment] [2021-06-30T22:12:07.252129+00:00] Loop 94
21
[experiment] [2021-06-30T22:12:07.252132+00:00] Loop 95
22
[experiment] [2021-06-30T22:12:07.252136+00:00] Loop 96
23
[experiment] [2021-06-30T22:12:07.252142+00:00] Loop 97
24
[experiment] [2021-06-30T22:12:07.252146+00:00] Loop 98
25
[experiment] [2021-06-30T22:12:07.252150+00:00] Loop 99
Copied!

Testing Your Dockerfile Locally

It is a good idea to test that your Dockerfile builds locally before sending it to Grid. This may allow you to iterate quickly over a set of configurations that work before submitting experiments.
You can do that by building it with Docker:
1
docker build --tag test-image .
Copied!
If the image builds, your Dockerfile is correctly defined.
After building your image, make sure to also test that your script works as expected inside of it. For example, if your script is called model.py then you would want to test your new image with:
1
docker run test-image python model.py
Copied!
Grid will be running a similar process in the backend, so if this works locally in our machine chances are that it will also run successfully on Grid.
Last modified 1mo ago