.. _benchmark-skycallback: SkyCallback =========== SkyCallback is a simple Python library that works in conjunction with SkyPilot Benchmark. It enables SkyPilot to provide a more detailed benchmark report without the need to wait until the task finishes. What SkyCallback is for -------------------------------------------- SkyCallback is designed for **machine learning tasks** which have a loop iterating many `steps`. SkyCallback measures the average time taken by each step, and extrapolates it to the total execution time of the task. Installing SkyCallback -------------------------------------------- Unlike SkyPilot, SkyCallback must be installed and imported `in your program`. To install it, add the following line in the ``setup`` section of your task YAML. .. code-block:: yaml setup: # Activate conda or virtualenv if you use one # Then, install SkyCallback pip install "git+https://github.com/skypilot-org/skypilot.git#egg=sky-callback&subdirectory=sky/callbacks/" Using SkyCallback generic APIs -------------------------------------------- The SkyCallback generic APIs are for **PyTorch, TensorFlow, and JAX** programs where training loops are exposed to the users. Below we provide the instructions for using the APIs. First, import the SkyCallback package and initialize it using ``init``. .. code-block:: python import sky_callback sky_callback.init() Next, mark the beginning and end of each step using one of the three equivalent methods. .. code-block:: python # Method 1: wrap your iterable (e.g., dataloader) with `step_iterator`. from sky_callback import step_iterator for batch in step_iterator(train_dataloader): ... # Method 2: wrap your loop body with the `step` context manager. for batch in train_dataloader: with sky_callback.step(): ... # Method 3: call `step_begin` and `step_end` directly. for batch in train_dataloader: sky_callback.step_begin() ... sky_callback.step_end() That's it. Now you can launch your task and get a detailed benchmark report using SkyPilot Benchmark CLI. `Here `_ we provide an example of applying SkyCallback to Pytorch ImageNet training. .. note:: Optionally in ``sky_callback.init``, you can specify the total number of steps that the task will iterate through. This information is needed to estimate the total execution time/cost of your task. .. code-block:: python sky_callback.init( total_steps=num_epochs * len(train_dataloader), # Optional ) .. note:: In distributed training, ``global_rank`` should be additionally passed to ``sky_callback.init`` as follows: .. code-block:: python # PyTorch DDP users global_rank = torch.distributed.get_rank() # Horovod users global_rank = hvd.rank() sky_callback.init( global_rank=global_rank, total_steps=num_epochs * len(train_dataloader), # Optional ) Integrations with ML frameworks ---------------------------------------------------------- Using SkyCallback is even easier for **Keras, PytorchLightning, and HuggingFace Transformers** programs where trainer APIs are used. SkyCallback natively supports these frameworks with simple interface. * Keras example .. code-block:: python from sky_callback import SkyKerasCallback # Add the callback to your Keras model. model.fit(..., callbacks=[SkyKerasCallback()]) `Here `_ you can find an example of applying SkyCallback to Keras ASR model training. * PytorchLightning example .. code-block:: python from sky_callback import SkyLightningCallback # Add the callback to your trainer. trainer = pl.Trainer(..., callbacks=[SkyLightningCallback()]) `Here `_ you can find an example of applying SkyCallback to PyTorchLightning GAN model training. * HuggingFace Transformers example .. code-block:: python from sky_callback import SkyTransformersCallback # Add the callback to your trainer. trainer = transformers.Trainer(..., callbacks=[SkyTransformersCallback()]) `Here `_ you can find an example of applying SkyCallback to HuggingFace BERT fine-tuning. .. note:: When using the framework-integrated callbacks, do not call ``sky_callback.init`` for initialization. The callbacks will do it for you.