Cloud machines are normally expensive. However, if your job can support being interrupted at any time (ie: fine tuning, model that can be restarted) then you could use the spot instances feature in Grid.
grid run --use_spot pl_mnist.py
To take advantage of interruptible machines, make sure your code does a few things:
You are saving checkpoints or any state you need. Grid automatically picks these up into your artifacts.
Make sure your code can be restarted from a checkpoint or state file.
Once the machine is interrupted, your job on Grid will stop. If you want to continue running your code do the following:
Navigate to your experiment artifacts.
copy the link to the state files (or checkpoint) that you need.
Resubmit the job with the path to that file.
For example, assume your script has an argument called --ck_path
grid run --use_spot main.py --ck_path https://grid.ai/url/to/checkpoint.ckpt