What's new and improved in Grid.
Archived release notes can be found on this page in Github.

January 12, 2022

CLI version: 0.7.3 A maintenance release has been issued with the following :
  • resolves an issue that was causing experiments to remain queued for 1 hour+
  • fixes issue where Datastores and Runs couldn’t be viewed from the UI
  • addresses an issue with Multinode Runs that were not running

Cluster Contexts

For users Bringing Your Own Cloud, we've introduced the concept of cluster contexts. You can set the cluster context so that all your CLI actions (including creation of a resource such as Run or Session) are made against that cluster.
By default, the cluster context is set to the global cluster. You can change the context at anytime by using the command: grid user set-cluster-context or by specifying the cluster name in ~/.grid/settings.json.
Find out which cluster context is currently set by using the grid usercommand.
More information in the documentation on how to 'Run Workloads in Your New Cluster'.

January 5, 2022

CLI version: 0.7.1
Hi! Welcome to 2022 :) Today we bring you a new Grid release with exciting new features, continued performance and stability improvements, and the beginnings of a very productive new year. As always use pip install lightning-grid --upgrade to update the CLI to the new version and hit us up in our Slack Community with any thoughts or questions.

Auto-resume Experiments

Surprise! You can now enable the auto-resume of experiments that are running on spot instances. Should your experiment be interrupted, Grid can automatically resume your experiment from the last saved checkpoint when a new instance becomes available.
And more good things:
  • Grid will recover all artifacts, including the last saved checkpoints.
  • The local filesystem will be preserved between experiment interruption and experiment resumption.
Note: Auto-resume is only available for Runs.

Enable Auto-resume in the UI

Select the “Auto-resume” option after enabling the Use Spot Instance option in a new Run.

Enable Auto-resume in the CLI

Use -auto_resume flag to indicate this experiment is safe to resume.
Example: grid run --use_spot --auto_resume --instance_type p3.2xlarge [mnist.py](<http://mnist.py/>)

Datastore Enhancements

Full S3 Datastore Support

You can now connect Grid to any publicly available S3 dataset, making it way faster to get your S3 data into Grid.
Specify a public S3 bucket, file, or path when creating a new Datastore.
Supported URL formats:
Note: Private S3 buckets are coming soon!

Datastore Mount Path

And the award for top FAQ goes to...
How do I access my data in a datastore?
With this release, accessing your data in a Session or Run is way more straightforward.
After you’ve created a datastore, you can access it at /datastores in a Session or Run.
More details on how to mount datastores:

Fixes and Enhancements

  • Performance improvements to Sessions, making your data on a Session faster to access once the Session is active from resuming.
  • Increased observability into Session statuses and reasons for a potential Session failure.
  • Hover over the status of a Datastore, Session, or Experiment for more details on the status.

Last modified 25m ago