Datastores (scalable datasets)
In Grid, we've introduced Datastores, high-performance, low-latency, versioned datasets.
Datastores can be attached to Runs or Sessions and preserve the file format and directory structure of the data used to create them.
We don't charge for data storage!
Upload data to Grid using Datastores. Datastores are low-latency, auto-versioned datasets.
Data inside the model script
Perhaps the simplest way is when your model script downloads the data.
Let's illustrate with PyTorch:
from torchvision.datasets import MNIST
from torchvision import transforms
# this line automatically downloads data
dataset = MNIST(os.getcwd(), download=True, transform=transforms.ToTensor())
- Simplest approach
- Enables the fastest iteration
- Downloads every time a job starts
- Spends compute money on data downloading
Once you have created a datastore, simply pass in its name to your script and Grid will auto-resolve the path. Assume you have a datastore named cats and you want to use version 1:
grid run main.py --data_dir /datastores/cats/1