Iteratively Developing a Project
Contents
Iteratively Developing a Project¶
This page shows a typical workflow for iteratively developing and running a project on SkyPilot.
Getting an interactive node¶
Interactive nodes are easy-to-spin-up VMs that enable fast development and interactive debugging.
To provision a GPU interactive node named dev
, run
$ # Provisions/reuses an interactive node with a single K80 GPU.
$ sky gpunode -c dev --gpus K80
See the CLI reference for all flags such as changing the GPU type and count.
Running code¶
To run a command or a script on the cluster, use sky exec
:
$ # If the user has written a task.yaml, this directly
$ # executes the `run` section in the task YAML:
$ sky exec dev task.yaml
$ # Run a script inside the workdir.
$ # Workdir contents are synced to the cluster (~/sky_workdir/).
$ sky exec dev -- python train_cpu.py
$ sky exec dev --gpus=V100:1 -- python train_gpu.py
Alternatively, the user can directly ssh
into the cluster’s nodes and run commands:
$ # SSH into head node
$ ssh dev
$ # SSH into worker nodes
$ ssh dev-worker1
$ ssh dev-worker2
SkyPilot provides easy password-less SSH access by automatically creating entries for each cluster in ~/.ssh/config
.
Referring to clusters by names also allows for seamless integration with common tools
such as scp
, rsync
, and Visual Studio Code Remote.
Note
Refer to Syncing Code and Artifacts for more details on how to upload code and download outputs from the cluster.
Ending a development session¶
To end a development session:
$ # Stop at the end of the work day:
$ sky stop dev
$ # Or, to terminate:
$ sky down dev
To restart a stopped cluster:
$ # Restart it the next morning:
$ sky start dev
Note
Stopping a cluster does not lose data on the attached disks (billing for the instances will stop while the disks will still be charged). Those disks will be reattached when restarting the cluster.
Terminating a cluster will delete all associated resources (all billing stops), and any data on the attached disks will be lost. Terminated clusters cannot be restarted.