Upgrade your CLI with
pip install lightning-grid --upgrade
❤️ Find us in our Slack Community to say hi and/or to express your thoughts/questions.
🥳 May 17, 2022
CLI version: 0.8.47
Today's release includes several bug fixes to improve the overall experience with Grid.
Fixes and Enhancements:
Faster experiment failing when errors are encountered during build or code execution
Improves the Run-creation flow in the Web UI by fixing error messages reported due to insufficient repo acess or invalid repos
Stability improvements to the UI and event reloading
Fixes experience with the drop-down in the experiments table which allows you to add hyperparameter columns
Allows support for nested requirements.txt files: Ex:
# install all extra dependencies for full package testing
# install all loggers for full package testing
# extended list of dependencies for development and run lint and tests
# install all extra dependencies for running examples
🥳 May 12, 2022
CLI version: 0.8.45
New and Improved Artifacts!
Today, we release an update to Artifacts which greatly improves stability and UX in the following ways:
- Ensures syncing of artifacts for fast-running experiments
- Ensures all artifacts that are produced by experiments are copied by Grid
- When the experiment stops running, the instance will not shut down until all artifacts have been copied
Note: With this change, a portion of instance CPU and RAM will be dedicated to artifact syncing processes. For users with memory-intensive code, if your code generates artifacts of size >= 1GB, you may experience a decrease in performance. In these scenarios, we recommend using an instance with more CPU/RAM.
Learn more about Artifacts and these new improvments here.
Additional Fixes and Enhancements
- Fixes issue with calculating pricing estimate during new run creation.
- Improves handling of Session in the event that a process goes out of memory. In these events, the process will be terminated but the Session will remain running.
🔧 May 3, 2022
CLI version: 0.8.37
⭐ Faster S3 Datastores!
We are happy to announce that, as of today, creating datastores from S3 buckets is almost instant!
In most cases, your S3 bucket will fit one (or both) of the following criteria:
- the bucket is continually updating with new data which you want included in a Grid datastore
- the bucket is particularly large (leading to long datastore creation times)
In both of these cases, you can pass the
--no-copy flag to the
grid datastore create command. This flag will prevent Grid from making a copy of the dataset, which significantly speeds up datastore creation time when working with large buckets or when you intend to make incremental changes to your bucket and do not want to re-upload the entire dataset each time you add a new file.
Here's an example:
grid datastore create S3://ruff-public-sample-data/esRedditJson --no-copy
Please note that direct access to private S3 buckets is not currently supported.
Fixes and Enhancements
[Enhancement] When specifying instance types with the
grid session change-instance-typecommand, you can use either the instance name (ex:
grid session change-instance-type splendid-banzai-981 2_CPU_4GB) or instance nickname (ex:
grid session change-instance-type splendid-banzai-981 t2.medium) interchangeably
[Enhancement] Grid's syntax for scheduling multiple experiments with combinations of arguments (ie. Grid Search or Random Search) sometimes might conflict with the expected script arguments. That's when you can use none strategy for parameter evaluation. More details can be found here
[Fix] Resolves an issue with creating Runs from the UI using the random search strategy when the nunmber of trials > experiments.
[Deprecated] Changing Session instance type from the UI is currently not supported.
🥳 April 13, 2022
CLI version: 0.8.26
Notable Fixes and Enhancements
- Adds a new option for skipping parameter evaluation when not using the grid search or random search HPO features. More details here
- Resolves issues with artifacts not saving correctly to experiment sub-directories
🥳 March 30, 2022
CLI version: 0.8.17
This release includes bug fixes and stability improvements.
We've deprecated the following CLI options:
grid run --description
grid stop session
🥳 March 15, 2022
CLI version: 0.8.7
🤯 GRID_SESSION_ID and GRID_SESSION_NAME environment variables
We've added two environment variables that allow you to programmatically reference a Session from within the Session itself.
🔧 March 10, 2022
CLI version: 0.8.4
✔️ Resolves an issue where using a relative path for the
dependency_file_info property in a Run config was breaking. For example, this now works if you were operating from a subdirectory of a git repo:
```# Dependency file specification
path: ./env/env-deepcdl-pytorch.yml ```
✔️ Support for specifying version of Julia image to use in Runs. We will support every patch release of julia from 1.6.1 up.
grid run --framework julia will use the latest Julia version available (currently 1.7.1)
grid run --framework julia:X.Y.Z will use Julia with the version X.Y.Z
✔️ Runs will fail more quickly if there is an issue with image building.
✔️ Resolves issue with
--num_trials parameter being ignored.
✔️ Logging improvements to silence noisy stacktraces.
✔️ 'pytorch' and 'torch' are now both equal and acceptable inputs to the framework option for
--framework pytorch ==
🔧 March 1, 2022
CLI version: 0.8.1
Spring cleaning came early. This release features a lot of backend magic that improves overall stability and UX with Grid. We’re also excited to announce a dazzling set of enhancements to Datastores! You’ll notice uploading to Datastores is now at least 5x faster! More details and information on how to use the feature are below.
- Datastore upload speeds increased by 5x
- Improved stability during Datastore uploads (reduced chance of failure during upload)
- Disk space usage will no longer increase during Datastore upload
- If a Datastore gets interrupted during upload, the next time you create a Datastore, you will be prompted to resume the upload
--sourceparameter has been deprecated. It will no longer be supported in future releases. You can just use
grid datastore create [filename]and the datastore will inherit the filename as its name
- Additional magical backend improvements that you can't see, but certainly will feel
Notable Fixes and Enhancements
grid runhelp menu includes additional information about the
- The following actions have been added to the YAML config:
- on_experiment_end (See the docs on Actions for more information)
- Newly created datastores with total size <1 MiB will report as 1 MiB total size
- Improvements to costs reporting for runs and experiments
⚠️ February 3, 2022
Artifacts don't sync for fast experiments
We've detected a race condition with short-running experiments which may cause artifacts not to be properly synced. We're working on a long-term solution for this, and will be fixed in the coming days. As a workaround, we recommend ensuring your experiments last at least a minute (to be safe), and sleep if needed. We are working on resolving this issue to be addressed in the next release.
🔧January 12, 2022
CLI version: 0.7.3 A maintenance release has been issued with the following :
- resolves an issue that was causing experiments to remain queued for 1 hour+
- fixes issue where Datastores and Runs couldn’t be viewed from the UI
- addresses an issue with Multinode Runs that were not running
For users Bringing Your Own Cloud, we've introduced the concept of cluster contexts. You can set the cluster context so that all your CLI actions (including creation of a resource such as Run or Session) are made against that cluster.
By default, the cluster context is set to the global cluster. You can change the context at anytime by using the command:
grid user set-cluster-context or by specifying the cluster name in
Find out which cluster context is currently set by using the
More information in the documentation on how to 'Run Workloads in Your New Cluster'.
🥳 January 5, 2022
CLI version: 0.7.1
Hi! Welcome to 2022 :) Today we bring you a new Grid release with exciting new features, continued performance and stability improvements, and the beginnings of a very productive new year. As always use
pip install lightning-grid --upgrade to update the CLI to the new version and hit us up in our Slack Community with any thoughts or questions.
Surprise! You can now enable the auto-resume of experiments that are running on spot instances. Should your experiment be interrupted, Grid can automatically resume your experiment from the last saved checkpoint when a new instance becomes available.
And more good things:
- Grid will recover all artifacts, including the last saved checkpoints.
- The local filesystem will be preserved between experiment interruption and experiment resumption.
🪄 Enable Auto-resume in the UI
Select the “Auto-resume” option after enabling the
Use Spot Instance option in a new Run.
🪄 Enable Auto-resume in the CLI
-auto_resume flag to indicate this experiment is safe to resume.
grid run --use_spot --auto_resume --instance_type p3.2xlarge [mnist.py](<http://mnist.py/>)
⭐ Full S3 Datastore Support
You can now connect Grid to any publicly available S3 dataset, making it way faster to get your S3 data into Grid.
Specify a public S3 bucket, file, or path when creating a new Datastore.
🪄 Supported URL formats:
⭐ Datastore Mount Path
And the award for top FAQ goes to...
How do I access my data in a datastore?
With this release, accessing your data in a Session or Run is way more straightforward.
After you’ve created a datastore, you can access it at
/datastores in a Session or Run.
More details on how to mount datastores:
Fixes and Enhancements
- Performance improvements to Sessions, making your data on a Session faster to access once the Session is active from resuming.
- Increased observability into Session statuses and reasons for a potential Session failure.
- Hover over the status of a Datastore, Session, or Experiment for more details on the status.