Skip to main content

⚡Datastores

Datastores (scalable datasets)

In Grid, we've introduced Datastores, high-performance, low-latency, versioned datasets.

Datastores can be attached to Runs or Sessions and preserve the file format and directory structure of the data used to create them.

note

We don't charge for data storage!

Product Tour

Upload data to Grid using Datastores. Datastores are low-latency, auto-versioned datasets.

Click here for the 1-minute product tour

Data inside the model script

Perhaps the simplest way is when your model script downloads the data.

Let's illustrate with PyTorch:

from torchvision.datasets import MNIST
from torchvision import transforms

# this line automatically downloads data
dataset = MNIST(os.getcwd(), download=True, transform=transforms.ToTensor())

Pros:

  • Simplest approach
  • Enables the fastest iteration

Cons:

  • Downloads every time a job starts
  • Spends compute money on data downloading

Datastore paths

Once you have created a datastore, simply pass in its name to your script and Grid will auto-resolve the path. Assume you have a datastore named cats and you want to use version 1:

grid run main.py --data_dir /datastores/cats/1