Skip to main content

GRID CLI REFERENCE

Grid CLI

Usage:

grid [OPTIONS] COMMAND [ARGS]...

Options:

NameTypeDescriptionDefault
--debugbooleanUsed for logging additional information for debugging purposes.False
-o, --outputchoice (console | json)Output formatconsole
--helpbooleanShow this message and exit.False

grid artifacts​

Downloads artifacts for a given run or experiments.

This will download artifacts generated by the runs / experiments. Regex filtering is used to determine which artifacts to download.

Usage:

grid artifacts [OPTIONS] RUNS_OR_EXPERIMENTS...

Options:

NameTypeDescriptionDefault
--bucket_path, --bucket_pathsbooleanDo not download. Show the bucket URL for the experiments instead. (BYOC only)False
--download_dirdirectoryDownload directory that will host all artifact files../grid_artifacts
-m, --match_regextextOnly show artifacts that match this regex filter. Best if quoted.``
--helpbooleanShow this message and exit.False

grid clusters​

Usage:

grid clusters [OPTIONS] COMMAND [ARGS]...

Options:

NameTypeDescriptionDefault
--helpbooleanShow this message and exit.False

aws​

Create a grid compute cluster with NAME from the provided AWS account details.

Usage:

grid clusters aws [OPTIONS] NAME

Options:

NameTypeDescriptionDefault
--external-idtextN/A
--role-arntextAWS role ARN attached to`the associated resources.
--regiontextAWS region which is used to host the associated resources.us-east-1
--instance-typestextInstance types which you desire to support for computer jobs within the cluster.g2.8xlarge, g3.16xlarge, g3.4xlarge, g3.8xlarge, g3s.xlarge, g4dn.12xlarge, g4dn.16xlarge, g4dn.2xlarge, g4dn.4xlarge, g4dn.8xlarge, g4dn.metal, g4dn.xlarge, p2.16xlarge, p2.8xlarge, p2.xlarge, p3.16xlarge, p3.2xlarge, p3.8xlarge, p3dn.24xlarge, t2.large, t2.medium, t2.xlarge, t2.2xlarge, t3.large, t3.medium, t3.xlarge, t3.2xlarge
--cost-savingsbooleanusing this flag ensures that the cluster is created with a profile that is optimized for cost saving, making runs cheaper but start-up times may increaseFalse
--waitbooleanusing this flag CLI will wait until the cluster is runningFalse
--edit-before-creationbooleanEdit the created cluster spec before submitting to API server.False
--helpbooleanShow this message and exit.False

logs​

Retrieve cluster logs from the managed cluster identified by CLUSTER_ID.

These logs are streamed to stdout, and can either be tailed to view log lines as they are generated, or limited to a time range.

Usage:

grid clusters logs [OPTIONS] CLUSTER_ID

Options:

NameTypeDescriptionDefault
-t, --tailbooleanwhether to tail log linesFalse
--fromtextThe starting timestamp to query cluster logs from.24 hours ago
--totextThe end timestamp / relative time increment to query logs for. This is ignored when tailing logs.0 seconds ago
--limitintegerThe max number of log lines returned.1000
--time-formatchoice (human | iso8601)Timestamp formatting styleiso8601
--helpbooleanShow this message and exit.False

grid credential​

Manage the credentials associated with your Grid account.

You can use these credentials to provide access to data stored in private s3 buckets.

You can find more information about using private s3 buckets with Grid [link=https://docs.grid.ai]here[/link] Please note that this feature is only available to BYOC users.

Usage:

grid credential [OPTIONS] COMMAND [ARGS]...

Options:

NameTypeDescriptionDefault
--helpbooleanShow this message and exit.False

create​

Create a credential associated with your Grid account.

You can use this credential to mount a Datastore from a private s3 bucket.

Usage:

grid credential create [OPTIONS]

Options:

NameTypeDescriptionDefault
--typechoice (s3)The credential type to create.s3
--clustertextThe cluster id where the credential will be created.prod-2
--s3-external-idtextMust be paired with --type s3 & --s3-role-arnNone
--s3-role-arntextMust be paired with --type s3 & --s3-external-idNone
--helpbooleanShow this message and exit.False

delete​

Use this command to delete a credential you have previously created.

Warning: Any resource (datastore, session, experiment) which uses this credential will become unusable after it has been deleted.

Please note that only the Grid user who created a credential will be able to delete it.

Usage:

grid credential delete [OPTIONS] ID

Options:

NameTypeDescriptionDefault
--clustertextThe cluster id where the credential will be created.prod-2
--helpbooleanShow this message and exit.False

list​

List all credentials associated with your Grid account.

Usage:

grid credential list [OPTIONS]

Options:

NameTypeDescriptionDefault
--clustertextThe cluster id where the credential will be created.prod-2
--typechoice (s3)filter credentials to list by type.None
--helpbooleanShow this message and exit.False

grid datastore​

Manages Datastore workflows.

Usage:

grid datastore [OPTIONS] COMMAND [ARGS]...

Options:

NameTypeDescriptionDefault
--globalbooleanFetch sessions from everyone in the team when flag is passedFalse
--clustertextThe cluster id to list datastores for.prod-2
--show-incompletebooleanShow any datastore uploads which were started, but killed or errored before they finished uploading all data and became "viewable" on the grid datastore user interface.False
--helpbooleanShow this message and exit.False

clearcache​

Clears datastore cache which is saved on the local machine when uploading a datastore to grid.

This removes all the cached files from the local machine, meaning that resuming an incomplete upload is not possible after running this command.

Usage:

grid datastore clearcache [OPTIONS]

Options:

NameTypeDescriptionDefault
--helpbooleanShow this message and exit.False

create​

Creates a datastore from SOURCE.

If you want to check the status of your datastore creation, please use the grid datastore command.

Here are some examples of this command in use:

To create a datastore from a directory (or file) on the local machine (optionally specifying a name):

grid datastore create ./my_cool_file.txt

grid datastore create ./some-directory/

grid datastore create ./some-directory/ --name my-awesome-datastore

To create a datastore from an S3 bucket (private buckets are not currently supported):

grid datastore create s3://ryft-public-sample-data/esRedditJson/

If you'd like to create a datastore from an S3 bucket that will be incrementally updated, or for very large datasets:

grid datastore create s3://ryft-public-sample-data/ --no-copy

To create a datastore from a zip or tar.gz file hosted at a URL:

grid datastore create https://cs.nyu.edu/~roweis/data/EachMovieData.tar.gz

Usage:

grid datastore create [OPTIONS] [SOURCE]

Options:

NameTypeDescriptionDefault
--sourcetextN/ANone
--no-copyboolean(beta) Use this flag when you intend to incrementally add data to the source bucket. Using this flag can also speed up datastore creation when working with large buckets. When using this flag, you cannot remove files from your bucket. If you'd like to add files, please create a new version of the datastore after you've added files to your bucket. Please note that Grid does not currently support private S3 buckets.False
--nametextName of the datastoreNone
--clustertextcluster id to create the datastore on. (Bring Your Own Cloud Customers Only).prod-2
--hpdboolean(beta) Use this flag to provision a HPD datastore backed by AWS FSx for Lustre. This type of datastore is automatically updated whenever new files are added or deleted from the source S3 bucket. Please take this into account when creating workflows around such Datastores. This feature is only available to BYOC customers.False
--hpd-throughputtext(beta) Throughput setting for HPDs. Select one of [low, medium, high]. low (default): 125mb/s of throughput per tib of Datastore capacity. Recommended for Datastores that will be used by one or two experiments or sessions simultaneously. medium: 500mb/s per tib of Datastore capacity. Recommended for Datastores that will be used by multiple experiments or sessions simultaneously. high: highest possible throughput of 1000mb/s per tib of Datastore capacity. Recommended when maximum performance is necessary to run a very high number of experiments. Please note that for a single session or experiment, selecting the medium or high settings will yield diminishing returns. In these cases, we recommended selecting the low (default) throughput option. Because HPDs offer elevated performance, please note that they can incur higher monthly costs than regular Grid Datastores, especially for the medium or high throughput options. This feature is only available to BYOC customers.low
--hpd-capacityinteger(beta) Capacity setting for HPDs in GiB. Must be 1200, 2400 or a multiple of 2400, up to 64800. This feature is only available to BYOC customers.1200
--hpd-preloadboolean(beta) Use this flag when provisioning a high-performance datastore when maximum performance is needed even on the first data access. (Due to technical reasons HPDs without this flag achieve maximum performance after the first data access). Avoid using this flag when the datastore needs to be ready for use as soon as possible or when you're only working on a partition of the data in the S3 bucket. This feature is only available to BYOC customers.False
--helpbooleanShow this message and exit.False

delete​

Deletes a datastore with the given name and version tag.

For bring-your-own-cloud customers, the cluster id of the associated resource is required as well.

Usage:

grid datastore delete [OPTIONS]

Options:

NameTypeDescriptionDefault
--nametextName of the datastore
--versionintegerVersion of the datastore
--clustertextcluster id to delete the datastore from. (Bring Your Own Cloud Customers Only).prod-2
--helpbooleanShow this message and exit.False

resume​

Resume uploading an incomplete datastore upload session.

Usage:

grid datastore resume [OPTIONS]

Options:

NameTypeDescriptionDefault
--helpbooleanShow this message and exit.False

grid delete​

Allows you to delete grid resources.

Usage:

grid delete [OPTIONS] COMMAND [ARGS]...

Options:

NameTypeDescriptionDefault
--helpbooleanShow this message and exit.False

cluster​

Delete CLUSTER and all associated AWS resources.

Deleting a run also deletes all Runs and Experiments which were started on the cluster. deletion permanently removes not only the record of all runs on a cluster, but all associated experiments, artifacts, metrics, logs, etc.

This process may take a few minutes to complete, but once started is irriversable. Deletion permanently removes not only cluster from being managed by grid, but tears down every resource grid managed (for that cluster id) in the host cloud. All object stores, container registries, logs, compute nodes, volumes, etc. are deleted and cannot be recovered.

Usage:

grid delete cluster [OPTIONS] CLUSTER

Options:

NameTypeDescriptionDefault
--forcebooleanForce delete cluster from grid system. This does NOT delete any resources created by the cluster, just cleaning up the entry from the grid system. You should not use this under normal circumstancesFalse
--waitbooleanusing this flag CLI will wait until the cluster is deletedFalse
--helpbooleanShow this message and exit.False

experiment​

Delete some set of EXPERIMENT_NAMES from grid.

This process is immediate and irreversible, deletion permanently removes not only the record of the experiment, but all associated artifacts, metrics, logs, etc.

Usage:

grid delete experiment [OPTIONS] EXPERIMENT_NAMES...

Options:

NameTypeDescriptionDefault
--helpbooleanShow this message and exit.False

run​

Delete some set of RUN_NAMES from grid.

Deleting a run also deletes all experiments contained within the run.

This process is immediate and irreversible, deletion permanently removes not only the record of the run, but all associated experiments, artifacts, metrics, logs, etc.

Usage:

grid delete run [OPTIONS] RUN_NAMES...

Options:

NameTypeDescriptionDefault
--helpbooleanShow this message and exit.False

grid docs​

Open the CLI docs.

Usage:

grid docs [OPTIONS]

Options:

NameTypeDescriptionDefault
--helpbooleanShow this message and exit.False

grid edit​

Edits a resource

Usage:

grid edit [OPTIONS] COMMAND [ARGS]...

Options:

NameTypeDescriptionDefault
--helpbooleanShow this message and exit.False

cluster​

Edit existing cluster

Usage:

grid edit cluster [OPTIONS] CLUSTER

Options:

NameTypeDescriptionDefault
--helpbooleanShow this message and exit.False

grid history​

View list of historic Runs.

Usage:

grid history [OPTIONS]

Options:

NameTypeDescriptionDefault
--globalbooleanFetch history from everyone in the team when flag is passedFalse
--helpbooleanShow this message and exit.False

grid instance-types​

List the compute node instance types which are available for computation.

For bring your own cloud customers, the instance types available are defined by the organizational administrators who created the cluster.

Usage:

grid instance-types [OPTIONS]

Options:

NameTypeDescriptionDefault
--clustertextCluster ID whence the instance types needs to be fetched. (Bring Your Own Cloud Customers Only).prod-2
--helpbooleanShow this message and exit.False

grid login​

Authorize the CLI to access Grid AI resources for a particular user.

If no username or key is provided, the CLI will prompt for them. After providing your username, a web browser will open to your account settings page where your API key can be found.

Usage:

grid login [OPTIONS]

Options:

NameTypeDescriptionDefault
--keytextAPI Key from GridNone
--usernametextUsername used in GridNone
--helpbooleanShow this message and exit.False

grid logs​

Shows stdout logs associated with some EXPERIMENT.

This includes both build and experiment logs.

Usage:

grid logs [OPTIONS] EXPERIMENT

Options:

NameTypeDescriptionDefault
--show-build-logsbooleanShows build logs if not shown by default.None
-l, --tail-linesintegerNumber of lines to show from the end.None
--helpbooleanShow this message and exit.False

grid run​

Launch a Run from some SCRIPT with the provided SCRIPT_ARGS.

A run is a collection of experiments which run with a single set of SCRIPT_ARGS. The SCRIPT_ARGS passed to the run command can represent fixed values, or a set of values to be searched over for each option. If a set of values are passed, a sweep (grid-search or random-search) will be performed, launching the desired number of experiments in parallel - each with a unique set of input arguments.

The script runs on the specified instance type and Grid collects the generated artifacts, metrics, and logs; making them available for you to view in real time (or later if so desired) on either our Web UI or via this CLI.

Usage:

grid run [OPTIONS] [RUN_COMMAND]...

Options:

NameTypeDescriptionDefault
--configPathPath to Grid config YML.None
--nametextName for this runNone
--clustertextN/Aprod-2
--strategychoice (grid_search | random_search | none)Hyper-parameter search strategyAvailable options: - grid_search (creates invocations for each combinations or evaluated arguments) - random_search (creates random_search conbination of parameters - you can add num_trials for defining exact number of them) - none (script arguments will not be evaluated at all and passed directly to the script)grid_search
--num_trialstextNumber of samples from full search space that are used by the random_search strategyNone
--seedtextSeed value for the random_search strategyNone
--instance_typetextInstance type to start training session inm5a.large
--gpusintegerNumber of GPUs to allocate per experiment0
--cpusintegerNumber of CPUs to allocate per experiment0
--memorytextHow much memory an experiment needs100
--datastore_nametextDatastore name to be mounted in trainingNone
--datastore_versionintegerDatastore version to be mounted in trainingNone
--datastore_mount_dirtextDirectory to mount Datastore in training job. The default datastore mount location is /datastoresNone
--frameworktextFramework to use in training. Select from available options: lightning, torch, tensorflow, julia (will select the latest available version), julia:1.6.1, julia:1.6.2, julia:1.6.3, julia:1.6.4, julia:1.6.5, julia:1.7.0, julia:1.7.1, torchelasticlightning
--use_spotbooleanUse spot instance. The spot instances, or preemptive instance can be shut down at willFalse
--ignore_warningsbooleanIf we should ignore warning when executing commandsFalse
--scratch_sizeintegerThe size in GB of the scratch space attached to the experiment100
--scratch_mount_pathtextThe mount path to mount the scratch space/tmp/scratch
-l, --localdirbooleanUpload source code from the local directory instead of having Grid clone the repo from GitHub (default).This option is particularly useful for users that do not host their source code on GitHub.False
-d, --dockerfiletextDockerfile for the image buildingNone
--dependency_filetextDependency file path. If not provided and a requirements.txt, environment.yml, or Project.tomlfile is present in the current-working-directory, then we will automaticallyinstall dependencies from according to the inferred file.None
--auto_resumebooleanMark this run as auto-resumable. If underlying node/instance/VM is terminated, the experiment will beautomatically resumed, with all artifacts restores from the lastknown state. The experiment code will receive SIGTERM signal and itmust exit with status code 0 upon properly dumping its state to disk.False
--helpbooleanShow this message and exit.False

grid session​

Contains a grouping of commands to manage sessions workflows.

Executing the grid session command without any further arguments or commands renders a list of all sessions registered to your Grid user account.

Usage:

grid session [OPTIONS] COMMAND [ARGS]...

Options:

NameTypeDescriptionDefault
--globalbooleanFetch sessions from everyone in the team when flag is passedFalse
--helpbooleanShow this message and exit.False

change-instance-type​

Change the instance type of a session; this allows you to upgrade or downgrade the compute capability of the session nodes while keeping all of your work in progress untouched.

The session must be PAUSED in order for this command to succeed

Specifying --spot allows you to change the instance to an interuptable spot instances (which come at a steap discount, but which can be interrupted and shut down at any point in time depending on cloud provider instance type demand).

specifying --on_demand changes the instance to an on-demand type, which cannot be inturrupted but is more expensive.

Usage:

grid session change-instance-type [OPTIONS] SESSION_NAME INSTANCE_TYPE

Options:

NameTypeDescriptionDefault
--spotbooleanUse a spot instance to launch the sessionNone
--on_demand, --on-demandbooleanUse an on-demand instance to launch the sessionNone
--helpbooleanShow this message and exit.False

create​

Creates a new interactive session with NAME.

Interactive sessions are optimized for development activites (before executing hyperparemeter sweeps in a Run). Once created, sessions can be accessed via VSCode, Jupyter-lab, or SSH interfaces.

Grid manages the installation of any/all core libraries, drivers, and interfaces to the outside world. Sessions can be run on anything from a small 2 CPU core + 4GB memory instance to a monster machine with 96 CPU cores + 824 GB memory + eight V100 GPUs + 40 GBPS network bandwidth (no, those values aren't typos!); or really anything in between.

Usage:

grid session create [OPTIONS]

Options:

NameTypeDescriptionDefault
--clustertextCluster to run onprod-2
--instance_typetextInstance type to start session in.m5a.large
--use_spotbooleanUse spot instance. The spot instances, or preemptive instance can be shut down at willFalse
--disk_sizeintegerThe disk size in GB to allocate to the session.200
--datastore_nametextDatastore name to be mounted in the session.None
--datastore_versionintegerDatastore version to be mounted in the session.None
--datastore_mount_dirtextAbsolute path to mount Datastore in the session (defaults to /datastores/<datastore-name>).None
--configPathPath to Grid config YMLNone
--nametextName for this sessionNone
--helpbooleanShow this message and exit.False

delete​

Deletes a session identified by SESSION_NAME.

Deleting a session will stop the running instance (and any computations being performed on it) and billing of your account. All work done on the machine is permenantly removed, including all/any saved files, code, or downloaded data (assuming the source of the data was not a grid datastore - datastore data is not deleted).

Usage:

grid session delete [OPTIONS] SESSION_NAME

Options:

NameTypeDescriptionDefault
--helpbooleanShow this message and exit.False

mount​

Mount session directory to local. The session is identified by SESSION and MOUNT_DIR is a path to a directory on the local machine.

To mount a filesystem use: ixNode:[dir] mountpoint

Examples:

# Mounts the home directory on the interactive node in dir data
grid session mount bluberry-122 ./data

# mounts ~/data directory on the interactive node to ./data
grid session mount bluberry-122:~/data ./data

To unmount it: fusermount3 -u mountpoint # Linux umount mountpoint # OS X, FreeBSD

Under the hood this is just passing data to sshfs after syncing grid's interactive, i.e. this command is dumbed down sshfs

Usage:

grid session mount [OPTIONS] INTERACTIVE_NODE MOUNT_DIR

Options:

NameTypeDescriptionDefault
--helpbooleanShow this message and exit.False

pause​

Pauses a session identified by the SESSION_NAME.

Pausing a session stops the running instance (and any computations being performed on it - be sure to save your work!) and and billing of your account for the machine. The session can be resumed at a later point with all your persisted files and saved work unchanged.

Usage:

grid session pause [OPTIONS] SESSION_NAME

Options:

NameTypeDescriptionDefault
--helpbooleanShow this message and exit.False

resume​

Resumes a session identified by SESSION_NAME.

Usage:

grid session resume [OPTIONS] SESSION_NAME

Options:

NameTypeDescriptionDefault
--helpbooleanShow this message and exit.False

ssh​

SSH into the interactive node identified by NODE_NAME.

If you'd like the full power of ssh, you can use any ssh client and do ssh <node_name>``. This command is stripped down version of it.

Example:

1. Path to custom key:

grid session ssh satisfied-rabbit-962 -- -i ~/.ssh/my-key

2. Custom ssh option:

grid session ssh satisfied-rabbit-962 -- -o "StrictHostKeyChecking accept-new"

Usage:

grid session ssh [OPTIONS] NODE_NAME [SSH_ARGS]...

Options:

NameTypeDescriptionDefault
--helpbooleanShow this message and exit.False

grid ssh-keys​

Manage SSH keys.

Usage:

grid ssh-keys [OPTIONS] COMMAND [ARGS]...

Options:

NameTypeDescriptionDefault
--helpbooleanShow this message and exit.False

add​

Register a new SSH public key by providing a path to the KEY file and a NAME for it in Grid.

Usage:

grid ssh-keys add [OPTIONS] NAME KEY

Options:

NameTypeDescriptionDefault
--helpbooleanShow this message and exit.False

authorized_keys​

list all registered SSH public keys in authorized_keys format

Usage:

grid ssh-keys authorized_keys [OPTIONS]

Options:

NameTypeDescriptionDefault
--limitintegermaximum number of public keys to fetch100
--helpbooleanShow this message and exit.False

list​

"list currently registered SSH public keys

Usage:

grid ssh-keys list [OPTIONS]

Options:

NameTypeDescriptionDefault
--limitintegermaximum number of public keys to fetch100
--helpbooleanShow this message and exit.False

rm​

remote registered SSH public key

Usage:

grid ssh-keys rm [OPTIONS] KEY_ID

Options:

NameTypeDescriptionDefault
--helpbooleanShow this message and exit.False

grid status​

Checks the status of Runs, Experiments, and Sessions.

Usage:

grid status [OPTIONS] [RUN]

Options:

NameTypeDescriptionDefault
--globalbooleanFetch status from all collaborators when flag is passedFalse
--helpbooleanShow this message and exit.False

grid stop​

Stop Runs or Experiments.

Usage:

grid stop [OPTIONS] COMMAND [ARGS]...

Options:

NameTypeDescriptionDefault
--helpbooleanShow this message and exit.False

experiment​

Stop one or more EXPERIMENT_NAMES.

This preserves progress completed up to this point, but stops further computations and any billing for the machines used.

Usage:

grid stop experiment [OPTIONS] EXPERIMENT_NAMES...

Options:

NameTypeDescriptionDefault
--helpbooleanShow this message and exit.False

run​

Stop one or more RUN_NAMES.

This preserves progress completed up to this point, but stops further computations and any billing for the machines used.

Usage:

grid stop run [OPTIONS] RUN_NAMES...

Options:

NameTypeDescriptionDefault
--helpbooleanShow this message and exit.False

grid sync-env​

Synchronize the requirements file with packages and versions from the currently active environment

Usage:

grid sync-env [OPTIONS]

Options:

NameTypeDescriptionDefault
--configtextPath to Grid config YMLNone
--dependency_filetextPath to dependency file. Defaults to the requirements.txt or environment.yml found in the rootNone
--helpbooleanShow this message and exit.False

grid team​

Show information about a TEAM_NAME.

Usage:

grid team [OPTIONS] TEAM_NAME

Options:

NameTypeDescriptionDefault
--helpbooleanShow this message and exit.False

grid user​

Show the user information of the authorized user for this CLI instance.

Usage:

grid user [OPTIONS] COMMAND [ARGS]...

Options:

NameTypeDescriptionDefault
--helpbooleanShow this message and exit.False

set-cluster-context​

Specify the default CLUSTER_NAME which all operations should be run against.

Usage:

grid user set-cluster-context [OPTIONS] CLUSTER_NAME

Options:

NameTypeDescriptionDefault
--helpbooleanShow this message and exit.False

set-default-cluster​

Specify the default CLUSTER_ID which all operations should be run against.

Usage:

grid user set-default-cluster [OPTIONS] CLUSTER_NAME

Options:

NameTypeDescriptionDefault
--helpbooleanShow this message and exit.False

grid version​

Prints CLI version to stdout.

Usage:

grid version [OPTIONS]

Options:

NameTypeDescriptionDefault
--helpbooleanShow this message and exit.False

grid view​

Grid view opens a web UI page details the output of some RUN_OR_EXPERIMENTS.

Usage:

grid view [OPTIONS] RUN_OR_EXPERIMENT

Options:

NameTypeDescriptionDefault
--helpbooleanShow this message and exit.False