About Spot Instances

Overview

In addition to the Standard/On-Demand instances, which are purchased at a fixed rate, the Platform also supports running task executions using cloud provider’s spare capacity. Such instances are Amazon Web Services (AWS) EC2 Spot Instances. These are provided at a significant discount and their availability varies with general usage.

Learn how the Platform uses these instance types and how this strategy could help you reduce the cost of running your tasks.

About AWS Spot Instances

Amazon EC2 Spot Instances let you take advantage of unused EC2 capacity in the AWS cloud suitable for time-flexible workloads. They are available for all projects whose location is set to an AWS region.

With Spot instances, you pay the Spot price that's in effect for the time period your instances are running. Spot instance prices are set by Amazon EC2 and adjust gradually based on long-term trends in supply and demand for Spot instance capacity.

Spot Instances are available at up to a 90% discount compared to On-Demand prices. To compare the current Spot prices against standard On-Demand rates, visit the Spot Instance Advisor.

Handling Instance Interruption

Spot instances are provided as an excess compute capacity so their availability varies with usage. The cloud provider might terminate these instances at any time if it requires access to those resources due to high demand.

The job(s) running on the instance at the time of termination will be interrupted and have to be run again from the beginning. The jobs will be restarted on an equivalent regular On-Demand instance to minimize time wasted in completing your task, unless the cause of termination is the "instance stopped responding" error. If that is the case, the jobs will be restarted on a Spot instance as well, since the cause of the error is a network or hardware malfunction, or a tool-related error, rather than Spot instance interruption.

Restarting jobs on another instance will inevitably prolong task execution time and add to the cost of running that job. The cost of re-running is greatest for long jobs that get interrupted close to completion. The possibility of interruption is why these instances are not recommended for running long, time-critical jobs.

Updated over 3 years ago