In Grid, we've introduced Datastores, high-performance, low-latency, versioned datasets.
Datastores can be attached to Runs or Sessions whenever your job needs data.
Upload data to Grid using Datastores. Datastores are low-latency, auto-versioned datasets.
Perhaps the simplest way is when your model script downloads the data.
Let's illustrate with PyTorch:
from torchvision.datasets import MNISTfrom torchvision import transforms# this line automatically downloads datadataset = MNIST(os.getcwd(), download=True, transform=transforms.ToTensor())
Enables the fastest iteration
Downloads every time a job starts
Spends compute money on data downloading
Once you have created a datastore, simply pass in its name to your script and Grid will auto-resolve the path. Assume you have a datastore named cats and you want to use version 1:
grid train main.py --data_dir grid:cats:1