⚡Datastores
Datastores are low-latency, high-performance, auto-versioned datasets.

Datastores (scalable datasets)

In Grid, we've introduced Datastores, high-performance, low-latency, versioned datasets.
Datastores can be attached to Runs or Sessions whenever your job needs data.
We don't charge for data storage!

Product Tour

Upload data to Grid using Datastores. Datastores are low-latency, auto-versioned datasets.

Data inside the model script

Perhaps the simplest way is when your model script downloads the data.
Let's illustrate with PyTorch:
1
from torchvision.datasets import MNIST
2
from torchvision import transforms
3
4
# this line automatically downloads data
5
dataset = MNIST(os.getcwd(), download=True, transform=transforms.ToTensor())
Copied!
Pros:
  • Simplest approach
  • Enables the fastest iteration
Cons:
  • Downloads every time a job starts
  • Spends compute money on data downloading

Datastore paths

Once you have created a datastore, simply pass in its name to your script and Grid will auto-resolve the path. Assume you have a datastore named cats and you want to use version 1:
1
grid run main.py --data_dir grid:cats:1
Copied!
Last modified 1mo ago