Saving Thousands by Running Scylla on EC2 Spot Instances - Spot.io

Saving Thousands by Running Scylla on EC2 Spot Instances

Reading Time: 5 minutes

Spot instances can save you a lot of cash but what if you have a stateful service such as a NoSQL database? The main challenge is that every node in the cluster must sometimes maintain its entire state (IP, Data, and other configurations)

This blog describes how a Scylla cluster can be used on AWS’s EC2 Spot without losing consistency with the help of SpotInst’s prediction technology and advanced stateful features

What is Scylla?

Scylla is an open-source distributed NoSQL database. It was designed to be compatible with Apache Cassandra while achieving significantly higher throughputs and lower latencies. It also supports the same protocols and file formats as Apache Cassandra. However, Scylla is a completely rewritten implementation in the C++ programming language and not in Java like Apache Cassandra. Scylla was also built using the Seastar framework which is an asynchronous programming library that replaces threads, shared memory, mapped files, and other classic Linux programming techniques. Scylla also provides a unique Disk I/O Scheduler which also helps to boost performance.

Benchmarks conducted by engineers at both ScyllaDB and third parties have demonstrated that Scylla outperforms Apache Cassandra by up to 10x!

image5

How Scylla replicates its data between nodes

Scylla provides an Always-On availability. Automatic failover and replication across multiple nodes and data centers provide reliable fault tolerance.

Scylla, like Cassandra, uses a type of protocol called “gossip” to exchange metadata about the identities of nodes in a cluster and whether they are up or down. Of course, since there is no single point of failure, there can be no single registry of a nodes state, so nodes must share information among themselves. 

image2

How to run Scylla on Spotinst

When architecting a new Scylla cluster, spot instances probably won’t be the architects’ first choice because of their inconsistent behavior and the fact that they can be terminated within 2 minutes notice makes it hard to manage a stable cluster. This is why Elastigroup is a classic choice for this kind of environment.

Elastigroup provides 100% availability of a service on top of the Spot market. By choosing the right bid for the right spot, historical and real-time data is analyzed to choose the spot instances who offer the combination of lowest price and highest longevity. Using a predictive algorithm, changes in the Spot Market are identified 15 minutes in advance and a Spot replacement is triggered seamlessly without service interruption.

As part of the stateful feature, Elastigroup allows retaining the data volumes of the machine. Any EBS volume that is attached to the instance will be continuously snapshotted while the machine is running and will be used as the block device mapping configuration upon replacement.

image1

In order to keep the same machine running in a failure scenario, you need to keep in mind several things:

Private IP – Make sure that the new machine has the same private IP so the gossip protocol can continue and communicate with this machine.

Volume – The node must be connected to the same storage and needs to have the same volume it had before. If not, the service will not be available.

Config filescylla.yaml by default is located at /etc/scylla/scylla.yaml. This must be configured so the nodes will know their configuration information.

In the config file, you will  need to configure a few key values such as:

  • Cluster_name – The name of the cluster. This setting prevents nodes in one logical cluster from joining another. All nodes in a cluster must have the same value
  • Listen_interface – The interface that Scylla binds to for connecting to other nodes
  • Seeds – Seed nodes are used during startup to bootstrap the gossip process and join the cluster
  • Rpc_address – IP address of the interface for client connections (Thrift, CQL)
  • Broadcast_address – IP address of the interface for inter-node connections, as seen from other nodes in the cluster

Rack Considerations

For better availability of your data, it’s recommended to spread the nodes between A-Z’s. This can be configured by the value Ec2Snitch in both the scylla.yaml and cassandra-rackdc.properties files.

Let’s assume that you have a cluster set up in the us-east-1 region. If node1 is in us-east-1a and node2 is in us-east-1b, Scylla would consider these nodes to be in two different racks within the same data center. Node1 would be considered rack 1a and node2 would be considered rack1b.

We will now show how to install a six-node cluster. Each data center will consist of three nodes and two seeds nodes. The IP’s are as follows:

U.S US-DC1

Node# Private IP    

Node1 192.168.1.1 (seed)

Node2 192.168.1.2 (seed)

Node3 192.168.1.3

U.S US-DC2

Node# Private IP    

Node4 192.168.1.4 (seed)

Node5 192.168.1.5 (seed)

Node6 192.168.1.6

On each Scylla node, edit the scylla.yaml file. The following is an example of one node per DC:

U.S Data-center 1 – 192.168.1.1

cluster_name: 'ScyllaDB_Cluster'

seeds: "192.168.1.1,192.168.1.2,192.168.1.4,192.168.1.5”

endpoint_snitch: Ec2Snitch

rpc_address: "192.168.1.201"

listen_address: "192.168.1.201"

U.S Data-center 2 – 192.168.1.4

cluster_name: 'ScyllaDB_Cluster'

seeds: "192.168.1.1,192.168.1.2,192.168.1.4,192.168.1.5”

endpoint_snitch: Ec2Snitch

rpc_address: "192.168.1.4"

listen_address: "192.168.1.4"

On each Scylla node, edit the cassandra-rackdc.properties file with the relevant rack and data center information:

Nodes 1-3

dc=us-east-1a

rack=RACK1

Nodes 4-6

dc=us-east-1b

rack=RACK2

Spotinst Console configuration

When configuring Elastigroup, it’s important to enable the stateful feature to persist your data and network configurations when replacing an instance in case of a spot interruption. Open the Compute tab followed by the stateful feature and chose the features as shown in the screenshot below.

image4

It’s also recommended to run the “nodetool drain” command in our shutdown script section to flush the commit log and gracefully stop accepting new connections.

Let’s see how this works in real life!

In the following image, you can view a Scylla cluster with 3 instances. All of the nodes are running on spot instances with our stateful feature configured.

Once one of the instances is interrupted, our stateful feature creates an instance with the Private IP and Root/Data volumes. As you can see below, the instances returns to the cluster.

Scylla & Spotinst together provides a strong combination of extreme performance and cost reduction.

Apache®, Apache Cassandra®, are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.

 

This blog post was also published on the Scylla Blog.