Command Line Interface

Core CLI

sky launch

Launch a task from a YAML or a command (rerun setup if cluster exists).

If ENTRYPOINT points to a valid YAML file, it is read in as the task specification. Otherwise, it is interpreted as a bash command.

In both cases, the commands are run under the task’s workdir (if specified) and they undergo job queue scheduling.

sky launch [OPTIONS] [ENTRYPOINT]...

Options

-c, --cluster <cluster>

A cluster name. If provided, either reuse an existing cluster with that name or provision a new cluster with that name. Otherwise provision a new cluster with an autogenerated name.

--dryrun

If True, do not actually run the job.

-s, --detach-setup

If True, run setup in non-interactive mode as part of the job itself. You can safely ctrl-c to detach from logging, and it will not interrupt the setup process. To see the logs again after detaching, use sky logs. To cancel setup, cancel the job via sky cancel. Useful for long-running setup commands.

-d, --detach-run

If True, as soon as a job is submitted, return from this call and do not stream execution logs.

--docker

If used, runs locally inside a docker container.

-n, --name <name>

Task name. Overrides the “name” config in the YAML if both are supplied.

--workdir <workdir>

If specified, sync this dir to the remote working directory, where the task will be invoked. Overrides the “workdir” config in the YAML if both are supplied.

--cloud <cloud>

The cloud to use. If specified, overrides the “resources.cloud” config. Passing “none” resets the config.

--region <region>

The region to use. If specified, overrides the “resources.region” config. Passing “none” resets the config.

--zone <zone>

The zone to use. If specified, overrides the “resources.zone” config. Passing “none” resets the config.

--num-nodes <num_nodes>

Number of nodes to execute the task on. Overrides the “num_nodes” config in the YAML if both are supplied.

--use-spot, --no-use-spot

Whether to request spot instances. If specified, overrides the “resources.use_spot” config.

--image-id <image_id>

Custom image id for launching the instances. Passing “none” resets the config.

--env <env>

Environment variable to set on the remote node. It can be specified multiple times. Examples:

 1. --env MY_ENV=1: set $MY_ENV on the cluster to be 1.

2. --env MY_ENV2=$HOME: set $MY_ENV2 on the cluster to be the same value of $HOME in the local environment where the CLI command is run.

3. --env MY_ENV3: set $MY_ENV3 on the cluster to be the same value of $MY_ENV3 in the local environment.

--gpus <gpus>

Type and number of GPUs to use. Example values: “V100:8”, “V100” (short for a count of 1), or “V100:0.5” (fractional counts are supported by the scheduling framework). If a new cluster is being launched by this command, this is the resources to provision. If an existing cluster is being reused, this is seen as the task demand, which must fit the cluster’s total resources and is used for scheduling the task. Overrides the “accelerators” config in the YAML if both are supplied. Passing “none” resets the config.

-t, --instance-type <instance_type>

The instance type to use. If specified, overrides the “resources.instance_type” config. Passing “none” resets the config.

--disk-size <disk_size>

OS disk size in GBs.

-i, --idle-minutes-to-autostop <idle_minutes_to_autostop>

Automatically stop the cluster after this many minutes of idleness, i.e., no running or pending jobs in the cluster’s job queue. Idleness gets reset whenever setting-up/running/pending jobs are found in the job queue. Setting this flag is equivalent to running sky launch -d ... and then sky autostop -i <minutes>. If not set, the cluster will not be autostopped.

--down

Autodown the cluster: tear down the cluster after all jobs finish (successfully or abnormally). If –idle-minutes-to-autostop is also set, the cluster will be torn down after the specified idle time. Note that if errors occur during provisioning/data syncing/setting up, the cluster will not be torn down for debugging purposes.

-r, --retry-until-up

Whether to retry provisioning infinitely until the cluster is up, if we fail to launch the cluster on any possible region/cloud due to unavailability errors.

-y, --yes

Skip confirmation prompt.

--no-setup

Skip setup phase when (re-)launching cluster.

Arguments

ENTRYPOINT

Optional argument(s)

sky exec

Execute a task or a command on a cluster (skip setup).

If ENTRYPOINT points to a valid YAML file, it is read in as the task specification. Otherwise, it is interpreted as a bash command.

Actions performed by sky exec:

  1. Workdir syncing, if:

    • ENTRYPOINT is a YAML with the workdir field specified; or

    • Flag --workdir=<local_dir> is set.

  2. Executing the specified task’s run commands / the bash command.

sky exec is thus typically faster than sky launch, provided a cluster already exists.

All setup steps (provisioning, setup commands, file mounts syncing) are skipped. If any of those specifications changed, this command will not reflect those changes. To ensure a cluster’s setup is up to date, use sky launch instead.

Execution and scheduling behavior:

  • The task/command will undergo job queue scheduling, respecting any specified resource requirement. It can be executed on any node of the cluster with enough resources.

  • The task/command is run under the workdir (if specified).

  • The task/command is run non-interactively (without a pseudo-terminal or pty), so interactive commands such as htop do not work. Use ssh my_cluster instead.

Typical workflow:

# First command: set up the cluster once.
sky launch -c mycluster app.yaml

# For iterative development, simply execute the task on the launched
# cluster.
sky exec mycluster app.yaml

# Do "sky launch" again if anything other than Task.run is modified:
sky launch -c mycluster app.yaml

# Pass in commands for execution.
sky exec mycluster python train_cpu.py
sky exec mycluster --gpus=V100:1 python train_gpu.py

# Pass environment variables to the task.
sky exec mycluster --env WANDB_API_KEY python train_gpu.py
sky exec [OPTIONS] CLUSTER ENTRYPOINT...

Options

-d, --detach-run

If True, as soon as a job is submitted, return from this call and do not stream execution logs.

-n, --name <name>

Task name. Overrides the “name” config in the YAML if both are supplied.

--workdir <workdir>

If specified, sync this dir to the remote working directory, where the task will be invoked. Overrides the “workdir” config in the YAML if both are supplied.

--cloud <cloud>

The cloud to use. If specified, overrides the “resources.cloud” config. Passing “none” resets the config.

--region <region>

The region to use. If specified, overrides the “resources.region” config. Passing “none” resets the config.

--zone <zone>

The zone to use. If specified, overrides the “resources.zone” config. Passing “none” resets the config.

--num-nodes <num_nodes>

Number of nodes to execute the task on. Overrides the “num_nodes” config in the YAML if both are supplied.

--use-spot, --no-use-spot

Whether to request spot instances. If specified, overrides the “resources.use_spot” config.

--image-id <image_id>

Custom image id for launching the instances. Passing “none” resets the config.

--env <env>

Environment variable to set on the remote node. It can be specified multiple times. Examples:

 1. --env MY_ENV=1: set $MY_ENV on the cluster to be 1.

2. --env MY_ENV2=$HOME: set $MY_ENV2 on the cluster to be the same value of $HOME in the local environment where the CLI command is run.

3. --env MY_ENV3: set $MY_ENV3 on the cluster to be the same value of $MY_ENV3 in the local environment.

--gpus <gpus>

Type and number of GPUs to use. Example values: “V100:8”, “V100” (short for a count of 1), or “V100:0.5” (fractional counts are supported by the scheduling framework). If a new cluster is being launched by this command, this is the resources to provision. If an existing cluster is being reused, this is seen as the task demand, which must fit the cluster’s total resources and is used for scheduling the task. Overrides the “accelerators” config in the YAML if both are supplied. Passing “none” resets the config.

-t, --instance-type <instance_type>

The instance type to use. If specified, overrides the “resources.instance_type” config. Passing “none” resets the config.

Arguments

CLUSTER

Required argument

ENTRYPOINT

Required argument(s)

sky stop

Stop cluster(s).

CLUSTER is the name (or glob pattern) of the cluster to stop. If both CLUSTER and --all are supplied, the latter takes precedence.

Data on attached disks is not lost when a cluster is stopped. Billing for the instances will stop, while the disks will still be charged. Those disks will be reattached when restarting the cluster.

Currently, spot instance clusters cannot be stopped.

Examples:

# Stop a specific cluster.
sky stop cluster_name

# Stop multiple clusters.
sky stop cluster1 cluster2

# Stop all clusters matching glob pattern 'cluster*'.
sky stop "cluster*"

# Stop all existing clusters.
sky stop -a
sky stop [OPTIONS] [CLUSTERS]...

Options

-a, --all

Stop all existing clusters.

-y, --yes

Skip confirmation prompt.

Arguments

CLUSTERS

Optional argument(s)

sky start

Restart cluster(s).

If a cluster is previously stopped (status is STOPPED) or failed in provisioning/runtime installation (status is INIT), this command will attempt to start the cluster. In the latter case, provisioning and runtime installation will be retried.

Auto-failover provisioning is not used when restarting a stopped cluster. It will be started on the same cloud, region, and zone that were chosen before.

If a cluster is already in the UP status, this command has no effect.

Examples:

# Restart a specific cluster.
sky start cluster_name

# Restart multiple clusters.
sky start cluster1 cluster2

# Restart all clusters.
sky start -a
sky start [OPTIONS] [CLUSTERS]...

Options

-a, --all

Start all existing clusters.

-y, --yes

Skip confirmation prompt.

-i, --idle-minutes-to-autostop <idle_minutes_to_autostop>

Automatically stop the cluster after this many minutes of idleness, i.e., no running or pending jobs in the cluster’s job queue. Idleness gets reset whenever setting-up/running/pending jobs are found in the job queue. Setting this flag is equivalent to running sky launch -d ... and then sky autostop -i <minutes>. If not set, the cluster will not be autostopped.

--down

Autodown the cluster: tear down the cluster after specified minutes of idle time after all jobs finish (successfully or abnormally). Requires –idle-minutes-to-autostop to be set.

-r, --retry-until-up

Retry provisioning infinitely until the cluster is up, if we fail to start the cluster due to unavailability errors.

-f, --force

Force start the cluster even if it is already UP. Useful for upgrading the SkyPilot runtime on the cluster.

Arguments

CLUSTERS

Optional argument(s)

sky down

Tear down cluster(s).

CLUSTER is the name of the cluster (or glob pattern) to tear down. If both CLUSTER and --all are supplied, the latter takes precedence.

Tearing down a cluster will delete all associated resources (all billing stops), and any data on the attached disks will be lost. Accelerators (e.g., TPUs) that are part of the cluster will be deleted too.

For local on-prem clusters, this command does not terminate the local cluster, but instead removes the cluster from the status table and terminates the calling user’s running jobs.

Examples:

# Tear down a specific cluster.
sky down cluster_name

# Tear down multiple clusters.
sky down cluster1 cluster2

# Tear down all clusters matching glob pattern 'cluster*'.
sky down "cluster*"

# Tear down all existing clusters.
sky down -a
sky down [OPTIONS] [CLUSTERS]...

Options

-a, --all

Tear down all existing clusters.

-y, --yes

Skip confirmation prompt.

-p, --purge

Ignore cloud provider errors (if any). Useful for cleaning up manually deleted cluster(s).

Arguments

CLUSTERS

Optional argument(s)

sky status

Show clusters.

The following fields for each cluster are recorded: cluster name, time since last launch, resources, region, zone, hourly price, status, autostop, command.

Display all fields using sky status -a.

Each cluster can have one of the following statuses:

  • INIT: The cluster may be live or down. It can happen in the following cases:

    • Ongoing provisioning or runtime setup. (A sky launch has started but has not completed.)

    • Or, the cluster is in an abnormal state, e.g., some cluster nodes are down, or the SkyPilot runtime is unhealthy. (To recover the cluster, try sky launch again on it.)

  • UP: Provisioning and runtime setup have succeeded and the cluster is live. (The most recent sky launch has completed successfully.)

  • STOPPED: The cluster is stopped and the storage is persisted. Use sky start to restart the cluster.

Autostop column:

  • Indicates after how many minutes of idleness (no in-progress jobs) the cluster will be autostopped. ‘-‘ means disabled.

  • If the time is followed by ‘(down)’, e.g., ‘1m (down)’, the cluster will be autodowned, rather than autostopped.

Getting up-to-date cluster statuses:

  • In normal cases where clusters are entirely managed by SkyPilot (i.e., no manual operations in cloud consoles) and no autostopping is used, the table returned by this command will accurately reflect the cluster statuses.

  • In cases where clusters are changed outside of SkyPilot (e.g., manual operations in cloud consoles; unmanaged spot clusters getting preempted) or for autostop-enabled clusters, use --refresh to query the latest cluster statuses from the cloud providers.

sky status [OPTIONS]

Options

-a, --all

Show all information in full.

-r, --refresh

Query the latest cluster statuses from the cloud provider(s).

sky autostop

Schedule an autostop or autodown for cluster(s).

Autostop/autodown will automatically stop or teardown a cluster when it becomes idle for a specified duration. Idleness means there are no in-progress (pending/running) jobs in a cluster’s job queue.

CLUSTERS are the names (or glob patterns) of the clusters to stop. If both CLUSTERS and --all are supplied, the latter takes precedence.

Idleness time of a cluster is reset to zero, when any of these happens:

  • A job is submitted (sky launch or sky exec).

  • The cluster has restarted.

  • An autostop is set when there is no active setting. (Namely, either there’s never any autostop setting set, or the previous autostop setting was canceled.) This is useful for restarting the autostop timer.

Example: say a cluster without any autostop set has been idle for 1 hour, then an autostop of 30 minutes is set. The cluster will not be immediately autostopped. Instead, the idleness timer only starts counting after the autostop setting was set.

When multiple autostop settings are specified for the same cluster, the last setting takes precedence.

Typical usage:

# Autostop this cluster after 60 minutes of idleness.
sky autostop cluster_name -i 60

# Cancel autostop for a specific cluster.
sky autostop cluster_name --cancel

# Since autostop was canceled in the last command, idleness will
# restart counting after this command.
sky autostop cluster_name -i 60
sky autostop [OPTIONS] [CLUSTERS]...

Options

-a, --all

Apply this command to all existing clusters.

-i, --idle-minutes <idle_minutes>

Set the idle minutes before autostopping the cluster. See the doc above for detailed semantics.

--cancel

Cancel any currently active auto{stop,down} setting for the cluster. No-op if there is no active setting.

--down

Use autodown (tear down the cluster; non-restartable), instead of autostop (restartable).

-y, --yes

Skip confirmation prompt.

Arguments

CLUSTERS

Optional argument(s)

Job Queue CLI

sky queue

Show the job queue for cluster(s).

sky queue [OPTIONS] [CLUSTERS]...

Options

-a, --all-users

Show all users’ information in full.

-s, --skip-finished

Show only pending/running jobs’ information.

Arguments

CLUSTERS

Optional argument(s)

sky logs

Tail the log of a job.

If JOB_ID is not provided, the latest job on the cluster will be used.

1. If no flags are provided, tail the logs of the job_id specified. At most one job_id can be provided. 2. If –status is specified, print the status of the job and exit with returncode 0 if the job is succeeded, or 1 otherwise. At most one job_id can be specified. 3. If –sync-down is specified, the logs of the job will be downloaded from the cluster and saved to the local machine under ~/sky_logs. Mulitple job_ids can be specified.

sky logs [OPTIONS] CLUSTER [JOB_IDS]...

Options

-s, --sync-down

Sync down the logs of the job (this is useful for distributed jobs todownload a separate log for each job from all the workers).

--status

If specified, do not show logs but exit with a status code for the job’s status: 0 for succeeded, or 1 for all other statuses.

--follow, --no-follow

Follow the logs of the job. [default: –follow] If –no-follow is specified, print the log so far and exit.

Arguments

CLUSTER

Required argument

JOB_IDS

Optional argument(s)

sky cancel

Cancel job(s).

sky cancel [OPTIONS] CLUSTER [JOBS]...

Options

-a, --all

Cancel all jobs on the specified cluster.

Arguments

CLUSTER

Required argument

JOBS

Optional argument(s)

Managed Spot Jobs CLI

sky spot launch

Launch a managed spot job from a YAML or a command.

If ENTRYPOINT points to a valid YAML file, it is read in as the task specification. Otherwise, it is interpreted as a bash command.

Examples:

# You can use normal task YAMLs.
sky spot launch task.yaml

sky spot launch 'echo hello!'
sky spot launch [OPTIONS] ENTRYPOINT...

Options

-n, --name <name>

Task name. Overrides the “name” config in the YAML if both are supplied.

--workdir <workdir>

If specified, sync this dir to the remote working directory, where the task will be invoked. Overrides the “workdir” config in the YAML if both are supplied.

--cloud <cloud>

The cloud to use. If specified, overrides the “resources.cloud” config. Passing “none” resets the config.

--region <region>

The region to use. If specified, overrides the “resources.region” config. Passing “none” resets the config.

--zone <zone>

The zone to use. If specified, overrides the “resources.zone” config. Passing “none” resets the config.

--num-nodes <num_nodes>

Number of nodes to execute the task on. Overrides the “num_nodes” config in the YAML if both are supplied.

--use-spot, --no-use-spot

Whether to request spot instances. If specified, overrides the “resources.use_spot” config.

--image-id <image_id>

Custom image id for launching the instances. Passing “none” resets the config.

--env <env>

Environment variable to set on the remote node. It can be specified multiple times. Examples:

 1. --env MY_ENV=1: set $MY_ENV on the cluster to be 1.

2. --env MY_ENV2=$HOME: set $MY_ENV2 on the cluster to be the same value of $HOME in the local environment where the CLI command is run.

3. --env MY_ENV3: set $MY_ENV3 on the cluster to be the same value of $MY_ENV3 in the local environment.

--gpus <gpus>

Type and number of GPUs to use. Example values: “V100:8”, “V100” (short for a count of 1), or “V100:0.5” (fractional counts are supported by the scheduling framework). If a new cluster is being launched by this command, this is the resources to provision. If an existing cluster is being reused, this is seen as the task demand, which must fit the cluster’s total resources and is used for scheduling the task. Overrides the “accelerators” config in the YAML if both are supplied. Passing “none” resets the config.

-t, --instance-type <instance_type>

The instance type to use. If specified, overrides the “resources.instance_type” config. Passing “none” resets the config.

--spot-recovery <spot_recovery>

Spot recovery strategy to use for the managed spot task.

--disk-size <disk_size>

OS disk size in GBs.

-d, --detach-run

If True, as soon as a job is submitted, return from this call and do not stream execution logs.

-r, --retry-until-up

Whether to retry provisioning infinitely until the cluster is up, if we fail to launch the cluster on any possible region/cloud due to unavailability errors. This applies to launching the the spot clusters (both initial and recovery attempts).

-y, --yes

Skip confirmation prompt.

Arguments

ENTRYPOINT

Required argument(s)

sky spot queue

Show statuses of managed spot jobs.

Each spot job can have one of the following statuses:
- SUBMITTED: The job is submitted to the spot controller.
- STARTING: The job is starting (starting a spot cluster).
- RUNNING: The job is running.
- RECOVERING: The spot cluster is recovering from a preemption.
- SUCCEEDED: The job succeeded.
- FAILED: The job failed due to an error from the job itself.
- FAILED_NO_RESOURCES: The job failed due to resources being unavailable
after a maximum number of retry attempts.
- FAILED_CONTROLLER: The job failed due to an unexpected error in the spot
controller.
- CANCELLED: The job was cancelled by the user.

If the job failed, either due to user code or spot unavailability, the error log can be found with sky logs sky-spot-controller-<user_hash> job_id. Please find your exact spot controller name with sky status.

(Tip) To fetch job statuses every 60 seconds, use watch:

watch -n60 sky spot queue
sky spot queue [OPTIONS]

Options

-a, --all

Show all information in full.

-r, --refresh

Query the latest statuses, restarting the spot controller if stopped.

sky spot cancel

Cancel managed spot jobs.

You can provide either a job name or a list of job ids to be cancelled. They are exclusive options. Examples:

# Cancel managed spot job with name 'my-job'
$ sky spot cancel -n my-job

# Cancel managed spot jobs with IDs 1, 2, 3
$ sky spot cancel 1 2 3
sky spot cancel [OPTIONS] [JOB_IDS]...

Options

-n, --name <name>

Managed spot job name to cancel.

-a, --all

Cancel all managed spot jobs.

-y, --yes

Skip confirmation prompt.

Arguments

JOB_IDS

Optional argument(s)

sky spot logs

Tail the log of a managed spot job.

sky spot logs [OPTIONS] [JOB_ID]

Options

-n, --name <name>

Managed spot job name.

--follow, --no-follow

Follow the logs of the job. [default: –follow] If –no-follow is specified, print the log so far and exit.

Arguments

JOB_ID

Optional argument

Interactive Node CLI

sky cpunode

Launch or attach to an interactive CPU node.

Examples:

# Launch a default cpunode.
sky cpunode

# Do work, then log out. The node is kept running. Attach back to the
# same node and do more work.
sky cpunode

# Create many interactive nodes by assigning names via --cluster (-c).
sky cpunode -c node0
sky cpunode -c node1

# Port forward.
sky cpunode --port-forward 8080 --port-forward 4650 -c cluster_name
sky cpunode -p 8080 -p 4650 -c cluster_name

# Sync current working directory to ~/workdir on the node.
rsync -r . cluster_name:~/workdir
sky cpunode [OPTIONS]

Options

-c, --cluster <cluster>

A cluster name. If provided, either reuse an existing cluster with that name or provision a new cluster with that name. Otherwise provision a new cluster with an autogenerated name.

-y, --yes

Skip confirmation prompt.

-p, --port-forward <port_forward>

Port to be forwarded. To forward multiple ports, use this option multiple times.

-i, --idle-minutes-to-autostop <idle_minutes_to_autostop>

Automatically stop the cluster after this many minutes of idleness, i.e. no running or pending jobs in the cluster’s job queue. Idleness gets reset whenever setting-up/running/pending jobs are found in the job queue. If not set, the cluster will not be auto-stopped.

--down

Autodown the cluster: tear down the cluster after all jobs finish (successfully or abnormally). If –idle-minutes-to-autostop is also set, the cluster will be torn down after the specified idle time. Note that if errors occur during provisioning/data syncing/setting up, the cluster will not be torn down for debugging purposes.

-r, --retry-until-up

Whether to retry provisioning infinitely until the cluster is up if we fail to launch the cluster on any possible region/cloud due to unavailability errors.

--cloud <cloud>

Cloud provider to use.

--region <region>

The region to use.

--zone <zone>

The zone to use.

-t, --instance-type <instance_type>

Instance type to use.

--use-spot

If true, use spot instances.

--screen

If true, attach using screen.

--tmux

If true, attach using tmux.

--disk-size <disk_size>

OS disk size in GBs.

sky gpunode

Launch or attach to an interactive GPU node.

Examples:

# Launch a default gpunode.
sky gpunode

# Do work, then log out. The node is kept running. Attach back to the
# same node and do more work.
sky gpunode

# Create many interactive nodes by assigning names via --cluster (-c).
sky gpunode -c node0
sky gpunode -c node1

# Port forward.
sky gpunode --port-forward 8080 --port-forward 4650 -c cluster_name
sky gpunode -p 8080 -p 4650 -c cluster_name

# Sync current working directory to ~/workdir on the node.
rsync -r . cluster_name:~/workdir
sky gpunode [OPTIONS]

Options

-c, --cluster <cluster>

A cluster name. If provided, either reuse an existing cluster with that name or provision a new cluster with that name. Otherwise provision a new cluster with an autogenerated name.

-y, --yes

Skip confirmation prompt.

-p, --port-forward <port_forward>

Port to be forwarded. To forward multiple ports, use this option multiple times.

-i, --idle-minutes-to-autostop <idle_minutes_to_autostop>

Automatically stop the cluster after this many minutes of idleness, i.e. no running or pending jobs in the cluster’s job queue. Idleness gets reset whenever setting-up/running/pending jobs are found in the job queue. If not set, the cluster will not be auto-stopped.

--down

Autodown the cluster: tear down the cluster after all jobs finish (successfully or abnormally). If –idle-minutes-to-autostop is also set, the cluster will be torn down after the specified idle time. Note that if errors occur during provisioning/data syncing/setting up, the cluster will not be torn down for debugging purposes.

-r, --retry-until-up

Whether to retry provisioning infinitely until the cluster is up if we fail to launch the cluster on any possible region/cloud due to unavailability errors.

--cloud <cloud>

Cloud provider to use.

--region <region>

The region to use.

--zone <zone>

The zone to use.

-t, --instance-type <instance_type>

Instance type to use.

--gpus <gpus>

Type and number of GPUs to use (e.g., --gpus=V100:8 or --gpus=V100).

--use-spot

If true, use spot instances.

--screen

If true, attach using screen.

--tmux

If true, attach using tmux.

--disk-size <disk_size>

OS disk size in GBs.

sky tpunode

Launch or attach to an interactive TPU node.

Examples:

# Launch a default tpunode.
sky tpunode

# Do work, then log out. The node is kept running. Attach back to the
# same node and do more work.
sky tpunode

# Create many interactive nodes by assigning names via --cluster (-c).
sky tpunode -c node0
sky tpunode -c node1

# Port forward.
sky tpunode --port-forward 8080 --port-forward 4650 -c cluster_name
sky tpunode -p 8080 -p 4650 -c cluster_name

# Sync current working directory to ~/workdir on the node.
rsync -r . cluster_name:~/workdir
sky tpunode [OPTIONS]

Options

-c, --cluster <cluster>

A cluster name. If provided, either reuse an existing cluster with that name or provision a new cluster with that name. Otherwise provision a new cluster with an autogenerated name.

-y, --yes

Skip confirmation prompt.

-p, --port-forward <port_forward>

Port to be forwarded. To forward multiple ports, use this option multiple times.

-i, --idle-minutes-to-autostop <idle_minutes_to_autostop>

Automatically stop the cluster after this many minutes of idleness, i.e. no running or pending jobs in the cluster’s job queue. Idleness gets reset whenever setting-up/running/pending jobs are found in the job queue. If not set, the cluster will not be auto-stopped.

--down

Autodown the cluster: tear down the cluster after all jobs finish (successfully or abnormally). If –idle-minutes-to-autostop is also set, the cluster will be torn down after the specified idle time. Note that if errors occur during provisioning/data syncing/setting up, the cluster will not be torn down for debugging purposes.

-r, --retry-until-up

Whether to retry provisioning infinitely until the cluster is up if we fail to launch the cluster on any possible region/cloud due to unavailability errors.

--region <region>

The region to use.

--zone <zone>

The zone to use.

-t, --instance-type <instance_type>

Instance type to use.

--tpus <tpus>

Type and number of TPUs to use (e.g., --tpus=tpu-v3-8:4 or --tpus=tpu-v3-8).

--use-spot

If true, use spot instances.

--tpu-vm

If true, use TPU VMs.

--screen

If true, attach using screen.

--tmux

If true, attach using tmux.

--disk-size <disk_size>

OS disk size in GBs.

Storage CLI

sky storage ls

List storage objects created.

sky storage ls [OPTIONS]

sky storage delete

Delete storage objects.

Examples:

# Delete two storage objects.
sky storage delete imagenet cifar10

# Delete all storage objects matching glob pattern 'imagenet*'.
sky storage delete "imagenet*"

# Delete all storage objects.
sky storage delete -a
sky storage delete [OPTIONS] [NAMES]...

Options

-a, --all

Delete all storage objects.

Arguments

NAMES

Optional argument(s)

Utils: show-gpus, check

sky show-gpus

Show supported GPU/TPU/accelerators.

The names and counts shown can be set in the accelerators field in task YAMLs, or in the --gpus flag in CLI commands. For example, if this table shows 8x V100s are supported, then the string V100:8 will be accepted by the above.

To show the detailed information of a GPU/TPU type (which clouds offer it, the quantity in each VM type, etc.), use sky show-gpus <gpu>.

To show all accelerators, including less common ones and their detailed information, use sky show-gpus --all.

NOTE: The price displayed for each instance type is the lowest across all regions for both on-demand and spot instances.

sky show-gpus [OPTIONS] [GPU_NAME]

Options

-a, --all

Show details of all GPU/TPU/accelerator offerings.

--cloud <cloud>

Cloud provider to query.

Arguments

GPU_NAME

Optional argument

sky check

Determine the set of clouds available to use.

This checks access credentials for AWS, Azure and GCP; on failure, it shows the reason and suggests correction steps. Tasks will only run on clouds that you have access to.

sky check [OPTIONS]