========== Quickstart ========== This guide will walk you through: - defining a task in a simple YAML format - provisioning a cluster and running a task - using the core SkyPilot CLI commands Be sure to complete the :ref:`installation instructions ` first before continuing with this guide. Hello, SkyPilot! ------------------ Let's define our very first task, a simple Hello, SkyPilot! program. Create a directory from anywhere on your machine: .. code-block:: console $ mkdir hello-sky $ cd hello-sky Copy the following YAML into a ``hello_sky.yaml`` file: .. code-block:: yaml resources: # Optional; if left out, automatically pick the cheapest cloud. cloud: aws # 1x NVIDIA V100 GPU accelerators: V100:1 # Working directory (optional) containing the project codebase. # Its contents are synced to ~/sky_workdir/ on the cluster. workdir: . # Typical use: pip install -r requirements.txt # Invoked under the workdir (i.e., can use its files). setup: | echo "Running setup." # Typical use: make use of resources, such as running training. # Invoked under the workdir (i.e., can use its files). run: | echo "Hello, SkyPilot!" conda env list This defines a task with the following components: - :code:`resources`: cloud resources the task must be run on (e.g., accelerators, instance type, etc.) - :code:`workdir`: the working directory containing project code that will be synced to the provisioned instance(s) - :code:`setup`: commands that must be run before the task is executed (invoked under workdir) - :code:`run`: commands that run the actual task (invoked under workdir) All these fields are optional. To launch a cluster and run a task, use :code:`sky launch`: .. code-block:: console $ sky launch -c mycluster hello_sky.yaml .. tip:: This may take a few minutes for the first run. Feel free to read ahead on this guide. .. tip:: You can use the ``-c`` flag to give the cluster an easy-to-remember name. If not specified, a name is autogenerated. The ``sky launch`` command performs much heavy-lifting: - selects an appropriate cloud and VM based on the specified resource constraints; - provisions (or reuses) a cluster on that cloud; - syncs up the :code:`workdir`; - executes the :code:`setup` commands; and - executes the :code:`run` commands. In a few minutes, the cluster will finish provisioning and the task will be executed. The outputs will show ``Hello, SkyPilot!`` and the list of installed Conda environments. Execute a task on an existing cluster ===================================== Once you have an existing cluster, use :code:`sky exec` to execute a task on it: .. code-block:: console $ sky exec mycluster hello_sky.yaml The ``sky exec`` command is more lightweight; it - syncs up the :code:`workdir` (so that the task may use updated code); and - executes the :code:`run` commands. Provisioning and ``setup`` commands are skipped. Bash commands are also supported, such as: .. code-block:: console $ sky exec mycluster python train_cpu.py $ sky exec mycluster --gpus=V100:1 python train_gpu.py For interactive/monitoring commands, such as ``htop`` or ``gpustat -i``, use ``ssh`` instead (see below) to avoid job submission overheads. View all clusters ================= Use :code:`sky status` to see all clusters (across regions and clouds) in a single table: .. code-block:: console $ sky status This may show multiple clusters, if you have created several: .. code-block:: NAME LAUNCHED RESOURCES COMMAND STATUS gcp 1 day ago 1x GCP(n1-highmem-8) sky cpunode -c gcp --cloud gcp STOPPED mycluster 4 mins ago 1x AWS(p3.2xlarge) sky exec mycluster hello_sky.yaml UP SSH into clusters ================= Simply run :code:`ssh ` to log into a cluster: .. code-block:: console $ ssh mycluster :ref:`Multi-node clusters ` work too: .. code-block:: console # Assuming 3 nodes. # Head node. $ ssh mycluster # Worker nodes. $ ssh mycluster-worker1 $ ssh mycluster-worker2 The above are achieved by adding appropriate entries to ``~/.ssh/config``. Transfer files =============== After a task's execution, use :code:`rsync` or :code:`scp` to download files (e.g., checkpoints): .. code-block:: console $ rsync -Pavz mycluster:/remote/source /local/dest # copy from remote VM For uploading files to the cluster, see :ref:`Syncing Code and Artifacts`. Stop/terminate a cluster ========================= When you are done, run :code:`sky stop mycluster` to stop the cluster. To terminate a cluster instead, run :code:`sky down mycluster`. Find more commands that manage the lifecycle of clusters :ref:`here `. Next steps ----------- Congratulations! In this quickstart, you have launched a cluster, run a task, and interacted with SkyPilot's CLI. Next steps: - Adapt :ref:`Tutorial: DNN Training` to start running your own project on SkyPilot! - See the :ref:`Task YAML reference `, :ref:`CLI reference `, and `more examples `_ - To learn more, try out `SkyPilot Tutorials `_ in Jupyter notebooks - Try :ref:`Interactive Nodes` -- launch VMs in one command without a YAML file - Explore SkyPilot's unique features in the rest of the documentation