Iteratively Developing a Project

This page shows a typical workflow for iteratively developing and running a project on SkyPilot.

Getting an interactive node

Interactive nodes are easy-to-spin-up VMs that enable fast development and interactive debugging.

To provision a GPU interactive node named dev, run

$ # Provisions/reuses an interactive node with a single K80 GPU.
$ sky gpunode -c dev --gpus K80

See the CLI reference for all flags such as changing the GPU type and count.

Running code

To run a command or a script on the cluster, use sky exec:

$ # If the user has written a task.yaml, this directly
$ # executes the `run` section in the task YAML:
$ sky exec dev task.yaml

$ # Run a script inside the workdir.
$ # Workdir contents are synced to the cluster (~/sky_workdir/).
$ sky exec dev -- python train_cpu.py
$ sky exec dev --gpus=V100:1 -- python train_gpu.py

Alternatively, the user can directly ssh into the cluster’s nodes and run commands:

$ # SSH into head node
$ ssh dev

$ # SSH into worker nodes
$ ssh dev-worker1
$ ssh dev-worker2

SkyPilot provides easy password-less SSH access by automatically creating entries for each cluster in ~/.ssh/config. Referring to clusters by names also allows for seamless integration with common tools such as scp, rsync, and Visual Studio Code Remote.

Note

Refer to Syncing Code and Artifacts for more details on how to upload code and download outputs from the cluster.

Ending a development session

To end a development session:

$ # Stop at the end of the work day:
$ sky stop dev

$ # Or, to terminate:
$ sky down dev

To restart a stopped cluster:

$ # Restart it the next morning:
$ sky start dev

Note

Stopping a cluster does not lose data on the attached disks (billing for the instances will stop while the disks will still be charged). Those disks will be reattached when restarting the cluster.

Terminating a cluster will delete all associated resources (all billing stops), and any data on the attached disks will be lost. Terminated clusters cannot be restarted.