Five Signs You Have Outgrown Cassandra

Five Signs You Have

Outgrown Cassandra

An Aerospike White Paper

November 2019

2

Contents

Executive Summary......................................................................................................................................... 3

Five Signs ....................................................................................................................................................... 4

Exploring the Five Signs in Depth .................................................................................................................... 4

Sign #1: You’re struggling with “server sprawl” and worried about TCO .......................................................... 4

Sign #2: Peak loads are disrupting SLAs ......................................................................................................... 7

Sign #3: You’re tired of living with cascading failures ..................................................................................... 10

Sign #4: Your operations team keeps growing and you’re worried about costs ............................................. 14

Sign #5: You’re struggling to acquire and retain Cassandra expertise ........................................................... 16

Aerospike: The Enterprise-Grade NoSQL Database ...................................................................................... 19

Reduced TCO ........................................................................................................................................... 19

Ultra Fast, Predictable Performance at Scale............................................................................................. 24

Proven Durability and Availability ............................................................................................................... 27

Self-Managing Features ............................................................................................................................. 28

Staffing ...................................................................................................................................................... 29

Summary ................................................................................................................................................... 29

Appendix A: Comparative benchmark details ................................................................................................ 31

Configuration ............................................................................................................................................. 31

Cassandra TCO and sizing assumptions ................................................................................................... 31

About Aerospike ........................................................................................................................................... 33

3

Executive Summary

Many firms find Cassandra to be an obvious choice for their initial NoSQL projects. After all, it’s a popular

open source project of the Apache Software Foundation, and several commercial offerings are available.

Getting started usually isn’t too difficult, either. But as your needs evolve, you may find it hard to achieve your

goals with Cassandra. Indeed, you may face high operational costs and service-level agreement (SLA)

violations as you attempt to manage growing data volumes, cope with more complex workloads and deliver

lower and lower data access latencies. This can impact Line of Business (LOB) budgets, operational stability

and ultimately, your customer’s experience. As a result, you may find that your Cassandra infrastructure

hinders your organization's ability to be agile, to compete and to bring new products and services to market.

Aerospike has helped many firms overcome such challenges. Like Cassandra, Aerospike is a NoSQL key-

value database system. Unlike Cassandra, Aerospike features a patented Hybrid Memory Architecture (HMA)

that delivers exceptional runtime performance for mixed read/write workloads at scale, often at a savings of

60% or more in operational expenses (opex) when compared to Cassandra and other alternatives. High

availability, strong data consistency and self-management features are just a few additional reasons why

Cassandra users frequently find Aerospike so appealing. Indeed, firms in financial services,

telecommunications, technology, retail and other industries now rely on Aerospike to support mission-critical

operational applications that require ultra-fast, predictable access to billions of records in databases of 10s-

100s of TBs or more.

Skeptical? Don’t take our word for it. Listen to our customers. Consider the experience of an identity

resolution firm1 that replaced its open source Cassandra environment with Aerospike; the firm reduced its

server footprint from 450 to 60 nodes and expects to save $6.9 million in TCO over three years. Explore how

a mobile marketing and advertising platform provider2 using Cassandra turned to Aerospike to minimize the

work necessary to achieve its SLAs. Learn how an online retailer moved from Cassandra to Aerospike after

experiencing latency issues, substantial operational overhead, and difficulty hiring Cassandra experts; with

Aerospike, they decreased their server footprint from 60 to 7 nodes and lowered their cost per usable TB.3

Perhaps you’re wondering how that’s possible. You’ll have a chance to learn more about Aerospike shortly.

But for now, let’s explore the main topic of this paper: five tell-tale signs that you may have outgrown

Cassandra.

1 https://www.aerospike.com/lp/debunking-free-open-source-myth/

2 https://aero-media.aerospike.com/2018/10/InMobi-Case-Study.pdf

3 https://youtu.be/IjWnG8dWTms?t=255

https://www.aerospike.com/lp/debunking-free-open-source-myth/

https://aero-media.aerospike.com/2018/10/InMobi-Case-Study.pdf

https://youtu.be/IjWnG8dWTms?t=255

4

Five Signs

What are the five signs you may have outgrown Cassandra?

Sign #1: You’re struggling with “server sprawl” and worried about TCO

● Are growing business demands driving ever-larger clusters?

● Can your budget accommodate the resulting cost increases?

● Does TCO concern you? Do you need to stretch your IT budget further?

Sign #2: Peak loads are disrupting SLAs

● Are you missing application service-level agreements (SLAs)? Is this impacting your bottom line?

● Are you provisioning more hardware or caches to meet your SLAs?

Sign #3: You need persistence with high performance

● Are you experiencing memory pressure or other computing resource pressures?

● Are you fighting compactions?

Sign #4: Your operations team keeps growing and you’re concerned about costs

● Are you fighting garbage collection, tombstones, and JVM tuning?

● Do you have to re-tune for each new workload, major software release, or hardware refresh?

Sign #5: You’re struggling to acquire and retain Cassandra expertise

● Can you find people with the right skills and experience to operate your clusters?

● Can you find people capable of committing changes back to Apache?

● Can you find people who truly understand data modeling in Cassandra?

Exploring the Five Signs in Depth

If your organization is like most, you’re probably facing demands to deliver new applications faster, reduce

costs, provide a reliable and engaging customer experience, drive digital transformations, and more. How do

these map to the signs that you have outgrown your Cassandra cluster?

Sign #1: You’re struggling with “server sprawl” and worried about

TCO

It's a dirty little secret: the NoSQL community and vendors have encouraged you to build big -- really big --

database clusters. It’s become a matter of honor to be running thousands of nodes (and, in at least one firm’s

case, 100,000 nodes4). But what are the consequences of such expansive clusters? In a large cluster, the

probability and incidence of having a node fail goes from a mere possibility to a frequent occurrence. Such

environments employ dozens of hard drives or SSDs per node over hundreds or thousands of nodes.

Vendors benefit from your big clusters. Ever notice how they price by the node? There’s no incent ive for them

to minimize cluster size; in fact, they’re motivated to do quite the opposite.

4 https://en.wikipedia.org/wiki/Apache_Cassandra#cite_note-36

https://en.wikipedia.org/wiki/Apache_Cassandra#cite_note-36

5

The larger the cluster, the more components you have; hence, hardware failures may occur weekly, daily, or

even hourly. And such failures can lead to performance problems, SLA violations, and even data loss, as

you’ll see in the next section. By contrast, smaller clusters mean fewer components and, hence, less frequent

failures.

Cassandra does a great job of horizontal scaling: you simply add more nodes. But can you fully utilize each

node before you need to buy, provision and manage another . . . and another . . . and another? You know the

answer: Cassandra can’t fully utilize a database node. Thus, when you hit a resource limit -- CPU, storage

IOPs, or DRAM for the JVM heap -- your only alternative is to scale out. Yet each node you add creates more

complexity. Each node you add also results in greater cost because of Cassandra vendors’ node-based

pricing model. And that’s not all. As Cassandra renders you unable to utilize the performance of your servers

in their entirety, you may be forced to take extreme measures. As one Cassandra expert noted,

Cassandra is a beast to run, taking considerable resources to function at peak performance. To help, learn

how you can run multiple Cassandra clusters on the same hosts.5

Indeed, running multiple Cassandra instances per box is so common that DataStax, a Cassandra vendor,

created a Multi-Instance feature as part of its Enterprise version to automate this deployment topology.

However, running multiple instances per server just adds further operational complexity and compounds

failure modes when a server goes down.

Why is Cassandra so inefficient with compute resources? The use of Java -- with multiple JVM providers and

garbage collection (GC) strategies (e.g., ParNew, CMS, G1GC) -- creates many variables that need to be

optimized and tuned. As one Cassandra consultant put it:

“. . . (W)hy bother tuning the JVM? Don’t the defaults work well enough out of the box? Unfortunately, if your

team cares at all about meeting an SLA, keeping costs to a minimum, or simply getting decent performance

out of Cassandra, it’s absolutely necessary.” 6

Furthermore, tuning the Java heap isn’t always straightforward. You can’t just continue to increase the heap

size to solve performance problems. After the optimal heap size is exceeded (a condition that will vary

depending on your hardware configuration and workload), you’ll face increased garbage collection and risk

lengthy “stop the world” operations. One detailed presentation7 on tuning JVMs and garbage collection

strategies with Cassandra concluded:

Super easy to get things wrong. Change 1 param at a time & test for > 1 day . . . . Don’t trust what you’ve

read! Check your GC logs!

To get the most out of a node, you need to carefully read the logs, adjust JVM parameters, debug, look at

thread dumps, monitor compactions, etc. Naturally, you need to do so for each workload and cluster

configuration, especially if the hardware is different. And when you upgrade the hardware on your existing

cluster? Yes, you need to tune all over again. What if you add a new workload? You’ve guessed it -- you need

to re-tune.

5 https://dzone.com/articles/how-to-run-multiple-cassandra-nodes-on-the-same-ho

6 https://thelastpickle.com/blog/2018/04/11/gc-tuning.html

7 https://www.slideshare.net/QuentinAmbard/jvm-tuning-for-low-latency-application-cassandra-95381363

https://dzone.com/articles/how-to-run-multiple-cassandra-nodes-on-the-same-ho

https://thelastpickle.com/blog/2018/04/11/gc-tuning.html

https://www.slideshare.net/QuentinAmbard/jvm-tuning-for-low-latency-application-cassandra-95381363

6

This tuning is difficult and requires detailed knowledge spread across user blogs8 9, presentations10, and

product documentation.11 12 Most operations teams find it easier to expand the cluster using the same

configurations and hardware profile. It’s a practical -- though costly -- approach for the Line of Business

owner, IT budget owner, or whoever has to pay the bill.

Think back on how you initially sized your cluster. You probably followed the best practices from numerous

community blogs, the Apache wiki, or documentation from one or more of the vendors. Hopefully, you took

into account the additional storage space needed to support your compaction strategy. You may have taken

into account space for snapshots. Were you later surprised when the application was deployed and used a

huge amount of additional space? This is where you needed to know with precision which features the

application team used when the application was constructed, as noted by one community user:

“. . . (When) we attempted to use a CQL Map to store analytics data, we saw 30X data size overhead vs.

using a simpler storage format and Cassandra’s old storage format, now called COMPACT STORAGE. Ah,

that’s where the name comes from: COMPACT, as in small, lightweight. Put another way, Cassandra and

CQL’s new default storage format is NOT COMPACT, that is, large and heavyweight.”13

Finally, your compaction and backup strategy can also have a huge impact on your capital expenditures

(capex). Because you rely on compaction to reduce storage requirements, you may end up backing up older

generations of the data again and again until the compactions can catch up. As noted in one DZone article,

“. . . (C)ompaction can cause a multi-fold increase in secondary storage requirements thus leading to a huge

CAPEX burden on customers. Case in point: . . . secondary storage was as high as 12 times the primary

storage for level compaction.”14

A different article explained,

Running low on disk space has always been problematic with Cassandra. . . . With compaction, we need

enough space while we’re performing a compaction for the original files as well as the new ones. This can add

up to a significant amount on very dense nodes.15

Indeed, increasingly high TCO is one of the most common reasons why Cassandra users turn to Aerospike.

Consider this conclusion from a former Cassandra user that needed to reduce its operational footprint and

infrastructure expenditures:

Compared to the other solutions that we were evaluating, the main drivers that made Aerospike so attractive

was its total cost of ownership performance and scale were all superior . . . . 16

Just how superior? By migrating from Cassandra to Aerospike, this firm cut its server footprint from 450 to 60

nodes and expects to save $6.9 million in TCO over three years.

8 https://thelastpickle.com/blog/2018/08/08/compression_performance.html

9 https://medium.com/linagora-engineering/tunning-cassandra-performances-7d8fa31627e3

10 https://www.slideshare.net/DataStax/lessons-learned-on-java-tuning-for-our-cassandra-clusters-carlos-

monroy-knewton-c-summit-2016

11 https://docs.datastax.com/en/ddac/doc/datastax_enterprise/operations/opsTuneJVM.html

12 https://support.datastax.com/hc/en-us/articles/204226009-Taking-Thread-dumps-to-Troubleshoot-High-

CPU-Utilization

13 https://blog.parse.ly/post/1928/cass/

14 https://dzone.com/articles/backup-and-restore-challenges-cassandra-compaction

15 https://thelastpickle.com/blog/2017/03/16/compaction-nuance.html

16 https://www.aerospike.com/lp/debunking-free-open-source-myth/

https://thelastpickle.com/blog/2018/08/08/compression_performance.html

https://medium.com/linagora-engineering/tunning-cassandra-performances-7d8fa31627e3

https://www.slideshare.net/DataStax/lessons-learned-on-java-tuning-for-our-cassandra-clusters-carlos-monroy-knewton-c-summit-2016

https://www.slideshare.net/DataStax/lessons-learned-on-java-tuning-for-our-cassandra-clusters-carlos-monroy-knewton-c-summit-2016

https://docs.datastax.com/en/ddac/doc/datastax_enterprise/operations/opsTuneJVM.html

https://support.datastax.com/hc/en-us/articles/204226009-Taking-Thread-dumps-to-Troubleshoot-High-CPU-Utilization

https://support.datastax.com/hc/en-us/articles/204226009-Taking-Thread-dumps-to-Troubleshoot-High-CPU-Utilization

https://blog.parse.ly/post/1928/cass/

https://dzone.com/articles/backup-and-restore-challenges-cassandra-compaction

https://thelastpickle.com/blog/2017/03/16/compaction-nuance.html


7

Sign #2: Peak loads are disrupting SLAs

Ingesting huge amounts of data can be critical to many applications. However, as the data is written,

applications often need to read and modify the same data; such mixed workloads constitute the normal

pattern for applications like activity streams, profile stores, trade stores, etc. Write-once workloads, where the

data is never modified, like log streams, are not the norm. Yet achieving strong, consistent performance when

running mixed workloads on Cassandra can be elusive, as one analyst noted:

Latency spikes with mixed workloads at scale. Cassandra handles ingest, but if you do [ingest operations]

with read and analytics, it spikes.17

If mixed reads and writes are so common, why is this pattern such a problem for Cassandra? Quite simply,

it’s due to choices made by the designers of Cassandra. For example, because Cassandra is written in Java,

users need to become proficient in tuning JVMs, garbage collectors, and other Java-unique artifacts. As one

Cassandra expert noted in a conference presentation,

[Compactions] can have . . . significant disk, memory (GC), cpu, IO overhead. [They] are often the cause of

‘unexplained’ latency or IO issues in the cluster.18

Beyond its Java code base, Cassandra’s storage model also impacts its runtime performance for mixed

workloads. Cassandra employs a log-structured merge tree (LSM tree) for storage; these structures require

certain maintenance operations (e.g., compactions) and often introduce additional latencies, particularly for

workloads that require a mix of read and write operations. As one article explained:

Because of the maintenance, LSM-Trees might result into worse latency, since both CPU and IO bandwidth is

spent re-reading and merging tables instead of just serving reads and writes. It’s also possible, under a write-

heavy workload, to saturate IO by just writes and flushes, stalling the compaction process. Lagging

compaction results into slower reads, increasing CPU and IO pressure, making . . . matters worse. This is

something to watch out for. 19

Furthermore, Cassandra’s approach to data consistency requires careful thought and application design. Its

“tunable consistency” requires programmers to understand 10 consistency level settings and select what’s

appropriate for their operational needs, taking availability, latency, throughput, and data correctness into

account. Various articles attempt to explain the options20 21 22 23, although some raise concerns about critical

issues, such as whether Cassandra truly supports linearizable reads and strong guarantees against data

loss.24 25 But perhaps most significant is that Cassandra has yet to pass the Jepsen tests26, which measure

the safety of distributed systems. Aerospike, by contrast, not only passed these tests but demonstrated that

its support for strong, immediate data consistency doesn’t compromise its exceptional runtime speeds.27

17 https://dzone.com/articles/database-concerns-1

18 https://www.slideshare.net/DataStax/lessons-learned-from-running-1800-clusters

19 https://medium.com/databasss/on-disk-io-part-3-lsm-trees-8b2da218496f

20 https://medium.com/dugglabs/data-consistency-in-apache-cassandra-part-1-7aee6b472fb4

21 https://medium.com/dugglabs/data-consistency-in-apache-cassandra-part-2-e673a3cfea32

22 https://medium.com/dugglabs/data-consistency-in-apache-cassandra-part-3-623b04cbf137

23 https://blog.yugabyte.com/apache-cassandra-lightweight-transactions-secondary-indexes-tunable-

consistency/

24 https://blog.yugabyte.com/apache-cassandra-lightweight-transactions-secondary-indexes-tunable-

consistency/

25 https://news.ycombinator.com/item?id=13592269

26 https://jepsen.io

27 https://www.aerospike.com/blog/aerospike-4-strong-consistency-and-jepsen/

https://dzone.com/articles/database-concerns-1

https://www.slideshare.net/DataStax/lessons-learned-from-running-1800-clusters

https://medium.com/databasss/on-disk-io-part-3-lsm-trees-8b2da218496f

https://medium.com/dugglabs/data-consistency-in-apache-cassandra-part-1-7aee6b472fb4

https://medium.com/dugglabs/data-consistency-in-apache-cassandra-part-2-e673a3cfea32

https://medium.com/dugglabs/data-consistency-in-apache-cassandra-part-3-623b04cbf137

https://blog.yugabyte.com/apache-cassandra-lightweight-transactions-secondary-indexes-tunable-consistency/




https://news.ycombinator.com/item?id=13592269

https://jepsen.io/

https://www.aerospike.com/blog/aerospike-4-strong-consistency-and-jepsen/

8

Cassandra developers often try to balance the need for consistency and high performance by using

consistency settings28 that at least get a quorum across the copies held across the nodes of the clusters for

reads and writes. This adds unpredictable latency to any operation, as the operation can only be as fast as

the slowest node. The most common solution is to cache more of the data to avoid disk reads; this leads to

larger clusters and more DRAM, driving up ownership costs and violating many of the tuning guidelines

regarding the size of the JVM heap.

As we mentioned in the previous section, Cassandra users often find themselves forced to cope with very

large server footprints as their data volumes grow. And such large clusters can increase the probability that

you’ll suffer from data loss, even if you have a data replication of 3 (a common Cassandra configuration in

which 3 separate copies of the data are spread across the cluster). Consider this analysis from distributed

systems researcher Martin Kleppmann:

The more nodes and disks you have in your cluster, the more likely it is that you lose data. This is a counter-

intuitive idea. . . . But I calculated the probabilities and drew a graph, and it looked like this:

To be clear, this isn’t the probability of a single node failing – this is the probability of permanently losing all

three replicas of some piece of data, so restoring from backup (if you have one) is the only remaining way to

recover that data. . . . The probability that all three replicas are dead nodes depends crucially on the

algorithm that the system uses to assign data to replicas. The graph above is calculated under the

assumption that the data is split into a number of partitions (shards), and that each partition is stored on three

randomly chosen nodes (or pseudo-randomly with a hash function). This is the case with consistent hashing,

used in Cassandra and Riak, among others (as far as I know). . . .

If the probability of permanently losing data in a 10,000 node cluster is really 0.25% per day, that would mean

a 60% chance of losing data in a year.29

But the possibility of data loss isn’t the only challenge you may face for your production applications,

particularly if you’re running mixed workloads. Consider how Cassandra processes data access requests.

Any node in the cluster can accept any read or write request even if it doesn’t manage the data involved. As

one Cassandra vendor explained,

28 https://docs.datastax.com/en/ddac/doc/datastax_enterprise/dbInternals/dbIntConfigConsistency.html

29 https://martin.kleppmann.com/2017/01/26/data-loss-in-large-clusters.html

https://docs.datastax.com/en/ddac/doc/datastax_enterprise/dbInternals/dbIntConfigConsistency.html

https://martin.kleppmann.com/2017/01/26/data-loss-in-large-clusters.html

9

Client read or write requests can be sent to any node in the cluster. When a client connects to a node with a

request, that node serves as the coordinator for that particular client operation. The coordinator acts as a

proxy between the client application and the nodes that own the data being requested. The coordinator

determines which nodes in the ring should get the request based on how the cluster is configured.30

This design causes extra network traffic. And excess network traffic impedes performance. Indeed, even

healthy clusters routinely undertake added network hops to service the simplest of requests, which can lead

to a wide variance in read latencies. Choosing a sensible partition key is only a partial solution: it simply limits

the number of nodes that must be checked rather than eliminating the need in the first place.

During any situation where nodes become unavailable, the coordinator node suffers further overhead31 since

it needs to keep track of any hinted handoff for writes that will need to be re-applied later. This can lead to

instability throughout the cluster, as we will see in the next sign.

Compactions add another layer of complexity. In any LSM tree system, you need to periodically prune and

compress the trees32, removing older and redundant versions of the data and cleaning tombstones (deleted

records). Cassandra has both major and minor compactions33, and tuning for these compactions is a popular

topic among the user community. In one blog, a Cassandra specialist noted that

Understanding how a compaction strategy complements your data model can have a significant impact on

your application’s performance.34

Cassandra users must choose between multiple compaction strategies, each of which is designed for certain

workloads. But understanding the intricacies of these strategies doesn’t guarantee success. As another

Cassandra consultant explained,

Unfortunately, there are many workloads which don’t necessarily fit well into any of the current compaction

strategies.35

Why do users worry about compactions? During the time compactions run, they adversely affect the read and

write latency and throughput of operations, which impacts your SLAs (again).

To deal with peak workloads, you could expand your Cassandra cluster at regular intervals or on specific

occasions (e.g., in anticipation of a product promotion). But don’t wait until the last minute to expand your

cluster or you might run into some unpleasant surprises. One blog detailed some common “gotchas” with

bootstrapping Cassandra nodes to expand a cluster, including data inconsistencies:

There a number of knobs and levers for controlling the default behaviour of bootstrapping. As demonstrated

by setting the JVM option cassandra.consistent.rangemovement=false the cluster runs the risk of over

streaming of data and worse still, it can violate consistency. For new users to Cassandra, the safest way to

add multiple nodes into a cluster is to add them one at a time.36

30 https://docs.datastax.com/en/ddac/doc/datastax_enterprise/dbArch/archIntro.html

31 https://thelastpickle.com/blog/2018/03/21/hinted-handoff-gc-grace-demystified.html

32 https://medium.com/databasss/on-disk-io-part-3-lsm-trees-8b2da218496f

33 http://cassandra.apache.org/doc/latest/operating/compaction.html


35 https://blog.pythian.com/proposal-for-a-new-cassandra-cluster-key-compaction-strategy/

36 https://thelastpickle.com/blog/2017/05/23/auto-bootstrapping-part1.html#gotchas

https://docs.datastax.com/en/ddac/doc/datastax_enterprise/dbArch/archIntro.html

https://thelastpickle.com/blog/2018/03/21/hinted-handoff-gc-grace-demystified.html

https://medium.com/databasss/on-disk-io-part-3-lsm-trees-8b2da218496f

http://cassandra.apache.org/doc/latest/operating/compaction.html


https://blog.pythian.com/proposal-for-a-new-cassandra-cluster-key-compaction-strategy/

https://thelastpickle.com/blog/2017/05/23/auto-bootstrapping-part1.html#gotchas

10

Indeed, in a recent conference presentation37, one former Cassandra user from The Trade Desk explained

how his team found itself faced with “cascading performance issues due to CPU consumption” caused by

compaction, compression, garbage collection (“even with optimized versions of GC”) and other factors. This

created a need for “a TON of CPU for a relatively small amount of data,” which translated to more machines.

Ultimately, his group opted to use Aerospike instead of Cassandra, which enabled them to reduce their

cluster size from 512 Cassandra nodes on AWS to 60 Aerospike nodes on AWS, improve performance and

availability, and reduce TCO.

Sign #3: You’re tired of living with cascading failures

For mission-critical systems, availability is the most crucial aspect of a data layer. After all, isn’t availability

why you picked AP from the CAP theorem38 and selected Apache Cassandra in the first place? You’ve

chosen a system that has distribution and replication of data, so when a node becomes unavailable --

momentarily or permanently -- the data lives elsewhere. Right?

Wrong, actually. Data distribution sounds well and good (and is the correct solution), but if a single node

outage causes a cascading failure across your cluster, every node becomes the single point of failure for the

cluster. And cascading failures are common.39 As one Cassandra user noted,

Cassandra seems to have two modes: fine and catastrophic.40

What, then, are some typical sources of failures? They include:

● Memory pressure caused by hinted handoff during failover

● Compactions trashing the cache

● Compactions causing I/O, and thus, CPU pressure

● Memory use causing frequent garbage collection, and thus, CPU pressure

● A large number of tables causing memory pressure

● Inappropriately sized partitions causing pressures for the Java heap and garbage collection

Let’s explore each in turn.

Memory pressure caused by hinted handoff during failover - As noted by one Cassandra vendor in its

documentation, the cause of this is clear. If you rely on Cassandra’s ability to store writes on a coordinator

node to replay later when the designated node returns to the cluster, this can consume large quantities of

memory and potentially result in failures:

If several nodes experience brief outages simultaneously, substantial memory pressure can build up on the

coordinator. The coordinator tracks how many hints it is currently writing. If the number of hints increases too

much, the coordinator refuses writes and throws the OverloadedException error.41

37 http://pages.aerospike.com/rs/229-XUE-318/images/The-Trade-Desk1-Aerospike-Summit_19.pdf

38 https://en.wikipedia.org/wiki/CAP_theorem

39 https://stackoverflow.com/questions/38249206/simultaneous-repairs-cause-repair-to-hang

40 https://www.slideshare.net/planetcassandra/pd-melting-cass/12?src=clipshare

41 https://docs.datastax.com/en/dse/6.7/dse-

arch/datastax_enterprise/dbArch/archRepairNodesHintedHandoff.html

http://pages.aerospike.com/rs/229-XUE-318/images/The-Trade-Desk1-Aerospike-Summit_19.pdf

https://en.wikipedia.org/wiki/CAP_theorem

https://stackoverflow.com/questions/38249206/simultaneous-repairs-cause-repair-to-hang

https://www.slideshare.net/planetcassandra/pd-melting-cass/12?src=clipshare

https://docs.datastax.com/en/dse/6.7/dse-arch/datastax_enterprise/dbArch/archRepairNodesHintedHandoff.html

https://docs.datastax.com/en/dse/6.7/dse-arch/datastax_enterprise/dbArch/archRepairNodesHintedHandoff.html

11

Cassandra users often struggle to understand the intricacies of using hinted hand-offs, leading to questions

about intermittent failures42, data restoration issues43, and data consistency44. As noted in one blog from a

Cassandra consultant,

There are many knobs to turn in Apache Cassandra. Finding the right value for all of them is hard. Yet even

with all values finely tuned unexpected things happen. In this post we will see how gc_grace_seconds can

break the promises of the Hinted Handoff.45

Another consulting firm published a Cassandra performance overview guide that included this warning:

The hinted handoff process can overload the coordinator node. If this happens, the coordinator will refuse

writes, which can result in the loss of some data replicas.46

Perhaps it’s not surprising, then, that some users recommend against using the feature:

Don’t use hinted handoffs (ANY or LOCAL_ANY quorum). In fact, just disable them in the configuration. It’s

too easy to lose data during a prolonged outage or load spike, and if a node went down because of the load

spike you’re just going to pass the problem around the ring, eventually taking multiple or all nodes down.47

Compactions trashing the cache - As noted by one Cassandra vendor, compactions, which increase during a

single node outage as greater load is applied on surviving nodes, also increase pressure on the I/O, memory,

and CPU of those nodes:

Apache Cassandra row cache caches only the head of a partition, where the number of rows cached is

configurable. . . . Because Cassandra row cache is ineffective for many workloads, some users introduce the

additional complexity of running with the row cache disabled, and rely on the operating system page cache

for serving reads. . . . Apache Cassandra compaction thrashes the page cache, because it reads and writes

everything, and after compaction the most frequently used data is likely to no longer be in the cache.48

A Cassandra cluster is often sized using assumptions about the effectiveness of the row cache; an ineffective

row cache leads to a greater number of connections and transactions in flight. This causes difficulty (at the

very least, performance issues) for surviving nodes. However, the effects of the cache are negated when the

blocks being cached are paged out by another database feature.

Compactions causing I/O, and thus, CPU pressure - Choosing between different compaction strategies can

dramatically influence I/O and CPU pressure. This is a function of the reads and writes at a given moment: a

predominantly read-heavy application will get the adverse effects of a data load job, causing pressure on both

I/O and CPU. Certain articles49 50 and conference presentations51 are dedicated to helping Cassandra users

understand the subtleties -- and potential performance pitfalls -- of various compaction strategies. As one

presenter who benchmarked size-tiered and leveled compaction strategies for a write-only workload noted,

42 https://stackoverflow.com/questions/48664110/why-does-hinted-handoff-not-work-sometimes

43 https://dba.stackexchange.com/questions/174889/help-in-understanding-hinted-handoffs-and-data-

replication-in-cassandra

44 https://stackoverflow.com/questions/43445671/whats-the-point-of-using-hinted-handoff-in-cassandra-

especially-for-consistenc

45 https://thelastpickle.com/blog/2018/03/21/hinted-handoff-gc-grace-demystified.html

46 https://www.scnsoft.com/blog/cassandra-performance

47 https://www.threatstack.com/blog/scaling-cassandra-lessons-learned

48 https://www.scylladb.com/product/technology/memory-management/


50 https://thelastpickle.com/blog/2016/12/08/TWCS-part1.html

51 https://www.scylladb.com/tech-talk/ruin-performance-choosing-wrong-compaction-strategy-scylla-summit-

2017/

https://stackoverflow.com/questions/48664110/why-does-hinted-handoff-not-work-sometimes

https://dba.stackexchange.com/questions/174889/help-in-understanding-hinted-handoffs-and-data-replication-in-cassandra

https://dba.stackexchange.com/questions/174889/help-in-understanding-hinted-handoffs-and-data-replication-in-cassandra

https://stackoverflow.com/questions/43445671/whats-the-point-of-using-hinted-handoff-in-cassandra-especially-for-consistenc

https://stackoverflow.com/questions/43445671/whats-the-point-of-using-hinted-handoff-in-cassandra-especially-for-consistenc

https://thelastpickle.com/blog/2018/03/21/hinted-handoff-gc-grace-demystified.html

https://www.scnsoft.com/blog/cassandra-performance

https://www.threatstack.com/blog/scaling-cassandra-lessons-learned

https://www.scylladb.com/product/technology/memory-management/


https://thelastpickle.com/blog/2016/12/08/TWCS-part1.html

https://www.scylladb.com/tech-talk/ruin-performance-choosing-wrong-compaction-strategy-scylla-summit-2017/

https://www.scylladb.com/tech-talk/ruin-performance-choosing-wrong-compaction-strategy-scylla-summit-2017/

12

. . . [With] the two existing compaction strategies which I mentioned, we find ourselves between a rock and a

hard place. Both of them give us problems. So for size-tiered compaction, . . . at some point we need twice

the disk space. This is a space amplification problem. For level-tiered compaction, we’ll do more than double

the disk I/O than size-tiered compaction.52

Another presentation on “lessons learned” from running a large number of Cassandra clusters included this

observation:

A single node doing compactions can cause latency issues across the whole cluster, as it will become slow to

respond to queries.53

Garbage collection occurring frequently, creating resource pressures - With a Java code base, garbage

collection is inevitable and uncontrollable. Tuning is possible, but it’s often described as a “dark art.” To

develop skills in this area, your staff may need to read through a number of detailed papers54 55 56 that cover

seemingly arcane topics such as old/tenured generation, young generation (with Eden and survivor heaps),

multi-phase concurrent marking cycle (MPCMC), etc. Maybe that’s not how you’d like your staff to spend

their time.

Unfortunately, you can’t just stick your head in the sand when it comes to garbage collection because its side

effects are real and potentially devastating. Consider this user’s experience:

We have been dealing a lot with garbage collector stop the world event, which can take up to 50 seconds in

our nodes, in the meantime Cassandra Node is unresponsive, not even accepting new logins.57

One Cassandra consultant cautioned:

Full GC pauses are awful, they can lock up the JVM for several minutes, more than enough time for a node to

appear as offline. We want to do our best to avoid Full GC pauses. In many cases it’s faster to reboot a node

than let it finish a pause lasting several minutes.58

Careful JVM tuning is one recommended approach59 60 because relying on defaults simply doesn’t cut it. As

one consultant noted,

The default Cassandra JVM argument is, for the most part, unoptimized for virtually every workload.61

But even that may not be sufficient. So what else might you try? How about rewriting your application?

If your application’s object creation rate is very high, then to keep up with it, the garbage collection rate will

also be very high. A high garbage collection rate will increase the GC pause time as well. Thus, optimizing the

52 https://youtu.be/MLbQXQ-c2vY?t=1356


54 https://www.azul.com/files/Understanding_Java_Garbage_Collection_v4.pdf

55 https://product.hubspot.com/blog/g1gc-fundamentals-lessons-from-taming-garbage-collection


57 https://stackoverflow.com/questions/46568777/cassandra-and-g1-garbage-collector-stop-the-world-event-

stw



60 https://dzone.com/articles/how-to-reduce-long-gc-pause

61 https://thelastpickle.com/blog/2019/01/30/new-cluster-recommendations.html

https://youtu.be/MLbQXQ-c2vY?t=1356


https://www.azul.com/files/Understanding_Java_Garbage_Collection_v4.pdf

https://product.hubspot.com/blog/g1gc-fundamentals-lessons-from-taming-garbage-collection


https://stackoverflow.com/questions/46568777/cassandra-and-g1-garbage-collector-stop-the-world-event-stw

https://stackoverflow.com/questions/46568777/cassandra-and-g1-garbage-collector-stop-the-world-event-stw



https://dzone.com/articles/how-to-reduce-long-gc-pause

https://thelastpickle.com/blog/2019/01/30/new-cluster-recommendations.html

13

application to create fewer objects is THE EFFECTIVE strategy to reduce long GC pauses. This might be a

time-consuming exercise . . . . 62

Later, the same author further observes that users may also need to increase RAM, decrease the number of

other running processes (to free up RAM), minimize I/O from other processes, or possibly increase the

number of threads allocated to garbage collection, although the latter is tricky because “(a)dding too many

GC threads will consume a lot of CPU and take away resources from your application.” 63

A large number of tables, causing memory pressure - The way in which you construct the data schema can

also impact memory pressure, and changes to application design and use can radically change hardware

requirements.

While there are legitimate use cases for having lots of tables in Cassandra, they are rare. . . . Many tables will

require more resources, obviously. How much? That depends on the settings, and the usage. For example, if

you have a thousand tables and write to all of them at the same time there will be contention for RAM since

there will be memtables for each of them, and there is a certain overhead for each memtable (how much

depends on which version of Cassandra, your settings, etc.).64

Inappropriately sized partitions causing Java heap and garbage collection pressures - Determining the

optimal partition size for your data and workload can be challenging. Recommendations on maximum size

vary depending on the Cassandra release involved; users often target less than 100MB65 66 or, in some

cases, less than 400MB.67 Since partition size is related to the number of columns per row, this impacts your

table design, often imposing restrictions you might prefer to avoid. But going “too wide” can impose its own

challenges:

Wide partitions in Cassandra can put tremendous pressure on the java heap and garbage collector, impact

read latencies, and cause issues ranging from load shedding and dropped messages to crashed and downed

node. 68

While recent changes in Cassandra 3.11 were designed to mitigate these issues, how to size your partitions

isn’t entirely straightforward. As one Cassandra code contributor noted:

After #11206 [a Cassandra improvement for supporting large partitions], what’s the recommended partition

size? It still depends - sorry. IMO, we moved the ‘barrier’ [but] test with your data model and workload.69

Or, as another Cassandra specialist said,

. . . (T)here are practical limitations tied to the impacts large partitions have on the JVM and

read times. These practical limitations are constantly increasing from version to version. This



64 https://stackoverflow.com/questions/33212414/cassandra-what-is-the-reasonable-maximum-number-of-

tables

65 https://medium.com/@johnny_width/calculating-cassandra-partition-disk-size-43276d1dcb34


67 https://www.backblaze.com/blog/wide-partitions-in-apache-cassandra-3-11/

68 https://www.backblaze.com/blog/wide-partitions-in-apache-cassandra-3-11/

69 https://www.slideshare.net/DataStax/myths-of-big-partitions-robert-stupp-datastax-cassandra-summit-

2016



https://stackoverflow.com/questions/33212414/cassandra-what-is-the-reasonable-maximum-number-of-tables

https://stackoverflow.com/questions/33212414/cassandra-what-is-the-reasonable-maximum-number-of-tables

https://medium.com/@johnny_width/calculating-cassandra-partition-disk-size-43276d1dcb34


https://www.backblaze.com/blog/wide-partitions-in-apache-cassandra-3-11/

https://www.backblaze.com/blog/wide-partitions-in-apache-cassandra-3-11/

https://www.slideshare.net/DataStax/myths-of-big-partitions-robert-stupp-datastax-cassandra-summit-2016

https://www.slideshare.net/DataStax/myths-of-big-partitions-robert-stupp-datastax-cassandra-summit-2016

14

practical limitation is not fixed but variable with data model, query patterns, heap size, and

configurations which makes it hard to . . . give a straight answer on what’s too large.70

Partition sizing is somewhat related to the server sprawl, too. Maintaining a small partition size (say, 100 - 400

MB) provides one way to cope with Java heap pressures, as the key cache resides in the heap. A smaller

amount of data in a partition allows a larger number of keys to reside in the key cache; this reduces the

likelihood of Java heap thrashing and triggering high levels of garbage collection. But small partition sizes can

force you to add more and more nodes as your data volumes grow, contributing to server sprawl and high

TCO.

So what’s a user to do? If you’re tired of trying to prevent one failure after another -- or, worse yet, tired of

fixing one failure only to encounter another -- it’s a good time to do what other Cassandra users have already

done: evaluate your options. As you’ll soon learn, Aerospike’s architecture and design characteristics avoid

the types of problems we’ve just described. Customer experiences prove that. But before we turn to

Aerospike, let’s explore the fourth sign that you may have outgrown Cassandra.

Sign #4: Your operations team keeps growing and you’re worried

about costs

The number of aspects that an operational team must consider at cluster provisioning time is large and

makes the process time-consuming. As a data architect at Wayfair explained,

. . . (W)ith Cassandra there’s a lot more configuration and tuning out of the box. Aerospike? Pretty much

change a few things and you’re good to go.71

Moreover, with Cassandra your operations team must continually monitor clusters for changes in application

patterns and re-tune frequently. Failure to do so leads not only to poor performance but also (eventually) to

CPU pressure, which limits garbage collection capabilities; this causes memory pressure, and in turn,

outages.

In order to increase utilization, clusters are forced to get wider. Consider the conclusion reached by the

director of data engineering at The Trade Desk, a firm that chose Aerospike over Cassandra for several

critical applications:

. . . (I)n order to get the throughput that we needed [with Cassandra], we needed to scale the number of

machines to a high number of machines with a lot of CPU compared to the disk they had. Aerospike gave us

another alternative . . . .72

Managing very large clusters increases operational complexity, requiring more time from your operations staff

to plan and deploy as well as more time to diagnose and fix increasingly complex and compounding failures.

The same situation is true for firms that run multiple Cassandra nodes on the same compute node to achieve

their goals. They’re faced with more operational complexity to plan, deploy, and maintain the environment;

failures naturally become more difficult to diagnose and fix, too.

70 https://stackoverflow.com/questions/46272571/why-is-it-so-bad-to-have-large-partitions-in-cassandra


72 https://www.aerospike.com/resources/videos/thetradedesk-integrated-hyper-scale-hot-and-cold-store/

https://stackoverflow.com/questions/46272571/why-is-it-so-bad-to-have-large-partitions-in-cassandra


https://www.aerospike.com/resources/videos/thetradedesk-integrated-hyper-scale-hot-and-cold-store/

15

The complexity of managing Cassandra when faced with demanding and growing workloads hasn’t been lost

on customers. And the business impact of such operational overhead can be substantial. As the CTO of

Signal explained,

Before Aerospike, we were spending more and more of our time on the care and feeding of Cassandra and

less and less time on the building of new product offerings. With Aerospike, we’ve now cleared the roadmap

and we’re just focused on adding new functionality to our platform for our customers.73

Signal’s experience wasn’t an isolated case, either. Performance tuning with Cassandra is an ongoing

challenge that operational teams must confront. Consider the conclusions reached by In-Mobi’s CTO and co-

founder after comparing Cassandra and Aerospike:

Cassandra required extensive database tuning in order to get performance and meet our SLA. We had to

reduce overhead and risk of errors and downtime . . . . Aerospike was the best choice across the board.

Aerospike was easy to deploy, with out-of-the-box functionality, and it’s been easy to use every step of the

way. Operational overhead was nearly zero, and we shaved three months off our launch schedule as a

result.74

Another engineer with Cassandra experience put it this way:

. . . you can't run Cassandra with almost any of its default settings (maybe besides ports). In most tools you'll

need to tweak a few defaults, with Cassandra you need to thoroughly read the docs on every little

configuration in the server and schema definitions (especially if you're working in multiple DCs), you'll always

find a little surprise if you skim through it.

It's also almost mandatory to read the internal design docs of cassandra even if you're not the admin but just

working with it. And modelling data is a lot less trivial than it looks - and almost always not what you assume.75

Indeed, data modeling is a well-recognized challenge in the Cassandra community: As one DataStax blog

noted:

One of the hardest parts of using Cassandra is picking the right data model.76

Designing both the logical and physical data models correctly is critical to a successful deployment. Get one

or both of them wrong, and you may find yourself facing performance, scalability, and availability issues. Why?

As one Cassandra advocate noted at the beginning of a detailed talk on data modeling,

The data model is the only thing you can’t change once in production . . . . Well, not easily, anyway. 77

So how hard is it to get right? One Big Data architect at WalmartLabs blogged that it “has not been easy” to

achieve performance goals, explaining that:

We had to . . . create schema that looked ‘weird’ but turned out to be more performant, and consider various

strategies to ‘delete’ the data. . . . Of all the tunings, strategies, tricks that can be applied to Cassandra to

73 https://youtu.be/tGPqvl3BwzY?t=126

74 https://aero-media.aerospike.com/2018/10/InMobi-Case-Study.pdf

75 https://news.ycombinator.com/item?id=9664203

76 https://www.datastax.com/2019/02/how-to-get-the-most-out-of-apache-cassandra

77 https://youtu.be/GoPS98hU5UQ?t=182

https://youtu.be/tGPqvl3BwzY?t=126

https://aero-media.aerospike.com/2018/10/InMobi-Case-Study.pdf

https://news.ycombinator.com/item?id=9664203

https://www.datastax.com/2019/02/how-to-get-the-most-out-of-apache-cassandra

https://youtu.be/GoPS98hU5UQ?t=182

16

gain performance, designing the most optimal schema had the biggest impact in terms of latency, stability,

and availability of the cluster.78

Indeed, how to avoid data modeling pitfalls is a popular topic among Cassandra users and vendors. For

example, one blog79 advises readers to avoid the “three stooges” of Cassandra data modeling (wide

partitions, tombstones, and data skew) while another80 focuses exclusively on tombstones (which “can cause

significant performance problems if not investigated, monitored, and dealt with in a timely manner”). As one

Cassandra consultant put it:

An incorrect data model can turn a single query into hundreds of queries, resulting in increased latency,

decreased throughput, and missed SLAs.81

Sign #5: You’re struggling to acquire and retain Cassandra expertise

Technical resources are hard to acquire and retain, and finding staff with deep Cassandra skills -- or

developing such skills in-house -- is no exception. Yet, you often need to do just that to effectively operate

Cassandra clusters. Indeed, a recent DataStax blog on “How to Get the Most out of Apache Cassandra” cites

the need to “thoroughly” train your staff as its top recommendation:

Getting the most out of Cassandra starts with making sure your team knows the ins and outs of the

technology and is comfortable using it.82

The blog recommends that training begin “a few weeks or even months” prior to a rollout, and further

suggests directing IT staff to additional resources beyond regular training exercises to help them get up to

speed.

Perhaps you don’t have the time or budget for that. Unfortunately, hiring skilled Cassandra technical staff can

prove challenging. A 2019 CIO survey by Harvey Nash/KPMG83 reported Big Data skills to be the scarcest.

Of course, Big Data is Cassandra’s target market.

But let’s say you’re one of the lucky organizations that manages to attract the interest of technical

professionals with Cassandra skills. Expect to pay a premium to get them to work for you. Recent salary

surveys84 85 indicate that senior software engineers and DBAs with Cassandra skills often earn 15%-20%

more than their peers skilled in other NoSQL or SQL DBMSs (e.g., MongoDB or Oracle). One 2017 analysis

from a Cassandra-as-a-service provider estimated that the burdened expense for recruiting and employing a

Cassandra expert would exceed $275,000 for the first year; additional expenses for support, equipment, and

other needs would drive initial total costs to $400,000 - $450,000 or more.86 And then there’s the issue of

staff retention:

78 https://medium.com/walmartlabs/avoid-pitfalls-in-scaling-your-cassandra-cluster-lessons-and-remedies-

a71ca01f8c04

79 https://blog.anant.us/common-problems-cassandra-data-models/

80 https://opencredo.com/blogs/cassandra-tombstones-common-issues/


82 https://www.datastax.com/2019/02/how-to-get-the-most-out-of-apache-cassandra

83 https://advisory.kpmg.us/articles/2019/cio-survey-harvey-nash-2019.html

84 https://www.payscale.com/research/US/Job=Senior_Software_Engineer/Salary/cd797527/Apache-

Cassandra

85 https://neuvoo.com/salary/?job=Cassandra+DBA

86 https://www.instaclustr.com/wp-content/uploads/2017/12/Avoiding-the-challenges-and-pitfalls-of-

Cassandra-implementation_White-Paper.pdf

https://medium.com/walmartlabs/avoid-pitfalls-in-scaling-your-cassandra-cluster-lessons-and-remedies-a71ca01f8c04

https://medium.com/walmartlabs/avoid-pitfalls-in-scaling-your-cassandra-cluster-lessons-and-remedies-a71ca01f8c04

https://blog.anant.us/common-problems-cassandra-data-models/

https://opencredo.com/blogs/cassandra-tombstones-common-issues/


https://www.datastax.com/2019/02/how-to-get-the-most-out-of-apache-cassandra

https://advisory.kpmg.us/articles/2019/cio-survey-harvey-nash-2019.html

https://www.payscale.com/research/US/Job=Senior_Software_Engineer/Salary/cd797527/Apache-Cassandra

https://www.payscale.com/research/US/Job=Senior_Software_Engineer/Salary/cd797527/Apache-Cassandra

https://neuvoo.com/salary/?job=Cassandra+DBA

https://www.instaclustr.com/wp-content/uploads/2017/12/Avoiding-the-challenges-and-pitfalls-of-Cassandra-implementation_White-Paper.pdf


17

At present, the demand for Cassandra expertise far outweighs the supply. . . . Each year, you can expect the

salaries for your core Cassandra team to grow by 10% or more just to keep them in your company as

recruiters call on them with higher and higher financial rewards if they leave your company.87

All too often, organizations using Cassandra learn how tough it can be to staff their projects. As a data

architect at Wayfair explained,

It’s very difficult to find people skilled in Cassandra who know all the ins and outs.88

Adding to your staffing challenges is the fact that your operations team must understand the differences

between the multiple versions of Cassandra and be able to tune effectively for them. Will your staff choose the

generic Apache Cassandra distribution or a vendored version created by your Cassandra distributor?

Differences between the open source and vendored versions aren’t simply a matter of a few features. For

example, one vendor’s Cassandra implementation (DataStax Enterprise) employs a thread-per-core (TPC)

architecture,89 while Apache Cassandra uses a staged event-driven architecture (SEDA).90 Performance

characteristics differ between the two, which can impact tuning requirements.

Yet there’s more for your staff to consider beyond tuning. If your team selects a Cassandra distributor, will the

distributor provide current releases? Will the distributor keep up with the community for features and bug fixes

or will your operations staff need to patch a vendor distribution with fixes from the open source version?

Historically, even core committers like DataStax have deferred migrating to a new major release from Apache

Cassandra by 8 months or more. You’ll have to decide if you want to wait until a patch is vendored in, move

to a newer Apache release yourself (and abandon your support subscription), or fix the code base you have.

Committing changes between distributions is just one more complex task for your operations staff to

undertake. As noted earlier, scaling Cassandra to support your growing data volumes can force your staff to

manage increasingly large clusters, run multiple instances per node, and become proficient in additional

software disciplines, such as JVM tuning, containers, virtual machines, etc. Expect to allocate more time for

your staff to become proficient in such skills, or expect to pay more to hire expert consultants to do the work

for you.

As mentioned earlier, a less-than-ideal data model can lead to performance and other problems in your

production environment. This has significant implications for your operations and development staff. Consider

the implications of using distributed counters in your application. To many developers, distributed counters

might seem like a reasonable feature for a distributed database, especially when performing real-time

analytics. Perhaps your application team chose them. That could prove troublesome. As Andrew Montalenti,

CTO of Parse.ly, noted:

When I’m in a good mood, I sometimes ask questions about Counters in the Cassandra IRC channel, and if

I’m lucky, long-time developers don’t laugh me out of the room. Sometimes, they just call me a “brave soul”...

All of this is to say: Cassandra Counters — it’s a trap! Run! 91

Again, this isn’t an isolated case. Indeed, one database architect and consultant with 8 years of Cassandra

experience explored a number of situations in which users “go wrong”:

87 https://www.instaclustr.com/wp-content/uploads/2017/12/Avoiding-the-challenges-and-pitfalls-of-

Cassandra-implementation_White-Paper.pdf


89 https://docs.datastax.com/en/dse/6.0/dse-admin/datastax_enterprise/newFeatures.html

90 https://issues.apache.org/jira/browse/CASSANDRA-10989

91 https://blog.parse.ly/post/1928/cass/




https://docs.datastax.com/en/dse/6.0/dse-admin/datastax_enterprise/newFeatures.html

https://issues.apache.org/jira/browse/CASSANDRA-10989

https://blog.parse.ly/post/1928/cass/

18

Cassandra has a bunch of features that probably shouldn’t be there. . . .

● Counters: They work most of the time, but they are very expensive and should not be used very

often.

● Light weight transactions: They are not transactions nor are they light weight.

● Batches: Sending a bunch of operations to the server at one time is usually good, saves network

time, right? Well in the case of Cassandra not so much.

● Materialized views: I got taken in on this one. It looked like it made so much sense. Because of

course it does. But then you look at how it has to work, and you go…Oh no!

. . . 92

Your operations staff can’t rest by just creating a checklist of features used; they must know how the data is

used and what its life cycle is. Consider something simple like a queue, where you need to maintain order and

also expunge data as soon as it’s processed. That simple data model design leads to operational problems

down the road, as Cassandra then needs to manage large numbers of tombstones (deleted records). As

Ryan Svihla, a Solution Architect at DataStax, remarked:

You realize that based on your queue workflow instead of 5 records you’ll end up with millions and millions per

day for your short lived queue, your query times end up missing SLA and you realize this won’t work for your

tiny cluster. 93

Challenges that can arise from tombstones aren’t limited to queues (a use case often acknowledged as one

of several Cassandra “anti-patterns”). One author commented on his experience with lists:

Too many tombstones and cassandra goes kaput. Well, more accurately, if you scan too many tombstones in

a query (100k by default) in later cassandra versions, it will cause your query to fail. This is a safe guard to

prevent against OOM and poor performance. 94

The use of TTL (Time-To-Live) as a mechanism to remove data after a specified time period is another

potential trap. As one Cassandra user observed:

... [TTL auto-expire] was able to effect a denial of service for all logins through creating a large amount of

garbage [tombstone] records. Once the records for these failed logins had expired, all queries to this table

started timing out.95

After exploring several examples involving Cassandra “TTL intricacies,” another user concluded,

In Cassandra, TTL should be used with caution otherwise it can lead to erratic behavior. 96

Furthermore, your operations staff must absorb the consequences of your development staff picking the

wrong primary or partition key. A poor choice inevitably ends up with hotspot nodes, which can cause

memory, CPU, and/or I/O pressure. This leads to the kind of cascading failures described earlier.

Unfortunately, as one Cassandra consultant noted, it’s not always easy to develop the right data model:

Evaluating a seemingly valid and straightforward data model for potential pitfalls requires a level of expertise

in Cassandra internals, the storage format in particular. . . . Performance and smooth operations of a

92 https://blog.pythian.com/cassandra-use-cases/

93 https://medium.com/@foundev/domain-modeling-around-deletes-1cc9b6da0d24#.goi7cxibs

94 https://jsravn.com/2015/05/13/cassandra-tombstones-collections/#lists

95 https://www.tildedave.com/2014/03/01/application-failure-scenarios-with-cassandra.html

96 https://medium.com/@27.rahul.k/cassandra-ttl-intricacies-and-usage-by-examples-d54248f2853c

https://blog.pythian.com/cassandra-use-cases/

https://medium.com/@foundev/domain-modeling-around-deletes-1cc9b6da0d24#.goi7cxibs

https://jsravn.com/2015/05/13/cassandra-tombstones-collections/#lists

https://www.tildedave.com/2014/03/01/application-failure-scenarios-with-cassandra.html

https://medium.com/@27.rahul.k/cassandra-ttl-intricacies-and-usage-by-examples-d54248f2853c

19

Cassandra cluster depend in large part on the quality of the data model and how well it suits the application.

There are many factors that shape the suitable data model and its design involves many complex decisions

and tradeoffs. 97

Moreover, at least one seemingly promising feature -- materialized views -- turned out to be more trouble than

it’s worth, according to many users.

When materialized views (MVs) were added to Cassandra 3 everyone, including me, was excited. It was one

of the most interesting features added in a long time and the possibility of avoiding manual denormalization

was very exciting. . . . Since the release of the feature, materialized views have retroactively been marked as

experimental, and we don’t recommend them for normal use. Their use makes it very difficult (or impossible)

to repair and bootstrap new nodes into the cluster. 98

And that’s not all. Another user lamented about synchronization issues99 while a third described functional

restrictions and performance issues:

Most importantly the serious restrictions on the possible primary keys of the Materialized Views limit their

usefulness a great deal. In addition any Views will have to have a well-chosen partition key and extra

consideration needs to be given to unexpected tombstone generation in the Materialized Views.

And, there is a definite performance hit compared to simple writes. 100

To be sure, the need for Cassandra expertise101 hasn’t gone unnoticed, prompting vendors to offer training

(sometimes free, sometimes fee-based), consulting services (fee-based), and other manners of support. And,

of course, there’s a lot of documentation, blogs, presentations, and videos online to help you build up in-

house expertise. But do you have the luxury of time -- and the budget -- for that? Increasingly, many firms that

use Cassandra are finding that they don’t. Aerospike offers an attractive alternative.

Aerospike: The Enterprise-Grade NoSQL Database

As noted earlier, Aerospike was designed and built from the ground up to take advantage of modern

computing architectures -- a distinction many customers have some to appreciate. As a vice president at

Nielsen noted,

One thing I love about Aerospike is that it’s very capable of taking advantage of any hardware you throw at it.

This might be your network devices, this might be your SSDs or your NVMe storage, it might be your RAM.102

Aerospike’s intellectual property and technology stack has enabled its customers to obtain and enjoy an

unprecedented low TCO; ultra-fast, predictable performance at scale; unparalleled uptime and reliability; a

self-managing environment; and low operational overhead. Let's explore how Aerospike has achieved this.

Reduced TCO Aerospike is written in C by a development team with deep expertise in networking, storage and databases.

By removing the layers of file system, block, and page caches, and instead building a proprietary log-

97 https://opencredo.com/blogs/cassandra-data-modelling-patterns/

98 https://thelastpickle.com/blog/2019/01/30/new-cluster-recommendations.html

99 https://techblog.fexcofts.com/2018/05/08/cassandra-materialized-views-ready-for-production/

100 https://opencredo.com/blogs/everything-need-know-cassandra-materialized-views/

101 https://adtmag.com/articles/2016/03/15/cassandra-nosql-skills.aspx

102 https://www.aerospike.com/resources/videos/nielsen-hurdling-operational-issues/

https://opencredo.com/blogs/cassandra-data-modelling-patterns/

https://thelastpickle.com/blog/2019/01/30/new-cluster-recommendations.html

https://techblog.fexcofts.com/2018/05/08/cassandra-materialized-views-ready-for-production/

https://opencredo.com/blogs/everything-need-know-cassandra-materialized-views/

https://adtmag.com/articles/2016/03/15/cassandra-nosql-skills.aspx

https://www.aerospike.com/resources/videos/nielsen-hurdling-operational-issues/

20

structured file system designed for the way flash devices work, Aerospike can deliver unprecedented

resource utilization. The bottom line is that Aerospike clusters have far fewer nodes than the equivalent

Cassandra cluster. Indeed, as we mentioned earlier, The Trade Desk103 and Signal104 chose Aerospike over

Cassandra in part because of its resource efficiency and smaller footprint. In doing so, these firms cut

hardware costs and simplified operations. And they’re not alone. For example, AdForm decreased its server

footprint from 32 nodes with Cassandra to 3 with Aerospike and achieved a fourfold expansion of data.105

Aerospike enabled AdForm to sustain the identical throughput as with their old Cassandra cluster, but with

lower -- and consistent -- latency and unmatched availability.

Lower cost. Predictable performance. Higher availability. You get all three with Aerospike.

Let’s explore the cost issues more closely. Aerospike recently updated its 2016 comparative benchmark106

with Cassandra, running the standard Yahoo! Cloud Serving Benchmark (YSCB) on Dell servers using

Aerospike Community Edition 4.6.0.4 and Apache Cassandra 3.11.4. (See Appendix A for full configuration

and benchmark details of Aerospike’s most recent test.) The workload involved a balanced mix of read and

write operations on 2.65 TB of unique user data. Both DBMSs were configured with a replication factor of 2,

so the total user data was 5.3 TB.

103 https://www.aerospike.com/resources/videos/thetradedesk-integrated-hyper-scale-hot-and-cold-store/

104 https://www.aerospike.com/customers/adform-divorces-cassandra-scales-4x-with-2x-reduced-servers/


106 https://www.aerospike.com/blog/comparing-nosql-databases-aerospike-vs-cassandra/

https://www.aerospike.com/resources/videos/thetradedesk-integrated-hyper-scale-hot-and-cold-store/

https://www.aerospike.com/customers/adform-divorces-cassandra-scales-4x-with-2x-reduced-servers/


https://www.aerospike.com/blog/comparing-nosql-databases-aerospike-vs-cassandra/

21

Aerospike’s throughput and latency advantages were striking. Aerospike demonstrated 15.4x better

throughput, 14.8x better 99th percentile read latency and 12.8x better 99th percentile update latency than

Apache Cassandra. Even at the 95th percentile, Aerospike’s read/write latencies were markedly better than

Cassandra’s. See Table 1.

Read Results Update Results

Throughput

Transactions

(per second)

95th

Percentile

Latency

(µsecs)

99th

Percentile

Latency

(µsecs)

Throughput

Transactions

(per second)

95th

Percentile

Latency

(µsecs)

99th

Percentile

Latency

(µsecs)

Apache Cassandra

3.11.4

23,999 8,487 17,187 23,999 8,319 20,109

Aerospike CE

4.6.0.4

369,998 744 1,163 369,998 1,108 1,574

Aerospike

advantage

15.4x

better

11.4x

better

14.8x

better

15.4x

better

7.51x

better

12.8x

better

Table 1: YCSB comparative results summary

Although we didn’t focus on the insert or loading process, it’s worth noting that we found it difficult to populate

Cassandra with the 2 billion objects required for the tests. When we ran YCSB clients without throttling, we

achieved an insert rate of 83K operations per second (OPS) but found that Cassandra dropped data. That’s

right. Subsequent reads showed that data we thought we had inserted into Cassandra was missing. The data

was loaded with 10 different instances of YCSB inserting different key ranges. We tried populating data at

rate of 60K OPS, but data was still being dropped. Finally, we dropped the insert rate to 40K OPS. After

nearly 14 hours, the load completed successfully -- all data was present. Unfortunately, we found each of the

nodes had over 4000 pending compactions. We hadn’t expected that. We had to wait 5 days for the

compactions to complete before we could proceed with our read/write benchmark tests.

What about Aerospike? By contrast, we successfully populated Aerospike with all 2 billion keys in less than 1

hour. Moreover, Aerospike was ready for the read/write benchmark tests immediately. Reliable and ready to

go.

Let’s consider for a moment how the benchmark results for our read/write workload (shown in Table 1) can

help us calculate TCO. We could take the 15.4x throughput improvement ratio for Aerospike, multiply that by

the server costs, and compare the results. But that would put Cassandra at an unfair disadvantage because

its performance profile will change as servers are added, particularly if multiple instances are run on a single

server. Furthermore, TCO involves more than hardware costs; labor, electricity, cooling, and other costs need

to be considered.

To arrive at a more realistic comparison, we used Aerospike’s benchmark results on a 3-node cluster as the

target SLA and attempted to size Cassandra’s environment to deliver the same throughput and latencies.

(See Appendix A for configuration and sizing details.) Unfortunately, even with different caching assumptions

and hit ratios, Cassandra failed to achieve the desired SLA for 99th percentile reads and writes due to JVM

garbage collection issues. Ultimately, we settled on a 95% page cache hit ratio for Cassandra, which yielded

latencies closest to (yet still higher than) Aerospike’s 95th percentile latencies.

22

If we configured Cassandra with one instance running on each node, we would need 73 servers to come

close to the target SLA achieved by Aerospike on a 3-server cluster. However, many Cassandra users bundle

multiple instances of Cassandra on a single server through virtualization or containers,107 so we used a more

realistic configuration of 3 Cassandra instances per server to lower Cassandra’s cluster size to 25 servers.

(To run multiple instances on a single server, we added 128GiB of memory per instance for each server.) The

differences in initial hardware costs are substantial. As Table 2 illustrates, the Aerospike cluster costs a mere

8% of the Cassandra cluster.

Cassandra

(3 instances/server)

Aerospike

(1 instance/server)

% cost savings with

Aerospike

Quantity Cost ($) Subtotal Quantity Cost ($) Subtotal

Servers (Dell R730xd)

25 10,842 271,050 3 10,842 32,526

Memory (32GB DIMM)

292 671 195,932 4 671 2,684

Storage (Micron 9200)

50 1,053 52,650 6 1,053 6,318

Cassandra Total 519,632 Aerospike Total 41,528 92%

Table 2: Initial hardware cost comparison for target SLA

But that’s only part of the story. TCO is more than hardware costs -- a lot more, particularly for a scaled-up

production environment.. Table 3 details hardware, power, cooling and support costs over a three-year

period for a sample production scenario. Using the capacity and performance data from the benchmark, we

increased the data size from 2 billion keys of unique data to 20 billion keys of unique data for Year 1, which

translates to 18 TB of unique data and 36 TB of replicated data. Further, we modeled a high-growth business

scenario in which the number of unique keys grows to 30 billion in Year 2, forcing the cluster size to grow by

50%. Year 3 models further growth, with the number of unique keys reaching 39.9 billion, driving the cluster

size to grow by 33% over the previous year. Once again, the results are striking. Aerospike reduced costs by

90%, delivering nearly $9.27 million in savings over the Apache Cassandra cluster for the three-year period.

107 https://robin.io/blog/lessons-learned-cassandra/

https://robin.io/blog/lessons-learned-cassandra/

23

Cassandra Total Aerospike Total

Year 1 Year 2 Year 3 Year 1 Year 2 Year 3

# of servers 244 365 486 30 45 60

Single server cost

over 3 years ($)

7,000 5,211

Server cost ($) 1,708,000 2,555,000 3,402,000 7,665,000 156,330 234,495 312,660 703,485

Power cost ($) 170,567 255,152 339,737 765,456 20,971 31,457 41,942 94,370

Cooling cost ($) 236,899 354,378 471,856 1,063,133 29,127 43,690 58,252 131,069

Space cost ($) 6,000 6,000 6,000 18,000 770 770 770 2,310

Support cost ($) 174,268 260,689 347,109 782,066 21,426 32,139 42,853 96,418

Total costs ($) 2,295,734 3,431,219 4,566,702 10,293,655 228,624 342,551 456,477 1,027,652

Table 3: Comparative 3-year TCO scenario

How does Aerospike achieve so much on such a comparatively small footprint? As Aerospike is a native C

implementation, there are no inefficiencies from a Java runtime. The primary key index is stored as a

parentless red-black tree108, enabling ultra-fast key lookups in DRAM. The data is then retrieved from the

proprietary log-structured file system that supports parallelization across all the devices on the chassis. A

typical Aerospike node will have 8-12 SSDs and sometimes as many as 16. By reading and writing in parallel

to all devices and removing the block and page caches, Aerospike can fully utilize all the IOPs and disk slots

available before running out of CPU.

Aerospike’s implementation is not faster simply because it uses C. It uses performance-tuned libraries, such

as re-implementations of msgpack, and specific tested versions of JEMalloc. The code is a highly

optimized, reference-counted multithreaded implementation, requiring certain developer skills in which

Aerospike specializes.

Perhaps you’re still unconvinced about Aerospike’s cost advantages even after reviewing our comparative

scenario. Aerospike has converted many Cassandra clusters to Aerospike, delivering tangible savings to

users as a result. Read Debunking the Free Open Source Myth to learn how one Cassandra customer

reduced its costs, server count, and latencies with Aerospike.

108 https://en.wikipedia.org/wiki/Red–black_tree


https://en.wikipedia.org/wiki/Red%E2%80%93black_tree

24

Ultra Fast, Predictable Performance at Scale Firms unfamiliar with Aerospike frequently ask how it can achieve ultra-fast and predictable performance at

scale. Quite simply, it was designed with that objective in mind. For example, Aerospike uses master-based

replication so operations are forwarded to the primary node for the record. This is equivalent to having a

specific coordinator node for each portion of the data. The primary key is generated by a cryptographic

algorithm, RIPEMD-160, used by the Bitcoin algorithm, which has had zero detected hash collisions in any

use. The first twelve bits of this generated hash identify the partition where the record resides.

Aerospike supports all the popular languages (C/C++, C#, Java, Python, Go, Node.js, PHP, Ruby, Perl, and

Erlang) with a vendor-supported language client. These high-performance native clients provide translation of

native data types to and from Aerospike, as well as interoperability between languages, greatly improving

developer productivity.

Unlike the Cassandra’s proxy model in which the coordinator node reroutes client requests, Aerospike’s

smart client layer maintains a dynamic partition map that identifies the master node for each partition. This

enables the client layer to route the read or write request directly to the correct node without any additional

network hops. Furthermore, Aerospike writes synchronously to all copies of the data so there is no need to do

any form of quorum read across the cluster to get a consistent version of the data. As AdForm’s CTO noted:

Even more important is the super fast key-value store and extraordinary predictability we get with Aerospike,

providing the responsiveness our clients require to compete in the crowded Internet and mobile markets.109

Yet Aerospike is about more than maintaining predictable performance for a short burst of time or shining in

5-minute benchmarks. Aerospike enables predictable performance over days and months -- the type of

performance that allows your business and applications to grow seamlessly as you broaden your presence in

local and international markets. Kayak’s SVP of Technology highlighted the importance of this:

As we continue to rapidly expand into international markets, we needed a solution that was reliable and could

scale to serve offers across our network. Aerospike enabled us to achieve multi-key gets in less than 3

milliseconds, deploy with ease and scale with very low jitter. 110

Finally, consider the conclusions presented by the chief architect of Hewlett Packard Enterprise’s AI-Driven

Big Data Solutions team, Theresa Melvin, in 2019 when she discussed “AI at Hyperscale - How to Go Faster

with a Smaller Footprint.”111 Faced with the need to plan for emerging “exascale” (extreme scale)

requirements -- e.g., ingesting more than 85PB (petabytes) of data per day, storing 2.6EB (exabytes) of raw

data monthly for 5 - 10 years -- Melvin evaluated various NoSQL databases. Her team ultimately

benchmarked several “pmem-aware” (persistent memory aware) offerings: Aerospike, Redis, RocksDB, and

Cassandra. Aerospike successfully completed the load test. Cassandra didn’t. Melvin observed that after the

4-hour mark was reached, the job was automatically killed with only about half the target records loaded.

After evaluating the full suite of test results, she offered this “take away”:

Aerospike’s PMEM-aware performance for a single node

▪ 300% increase over RockDB

▪ Between 270% and 1667% over Cassandra’s

▪ 270% if Cassandra was able to sustain its 92K ops/sec

▪ 1667% per Cassandra’s current 15K sustained r/u performance

▪ Aerospike’s Cluster Footprint

▪ 2-nodes required for Strong Consistency


110 https://www.aerospike.com/blog/how-to-travel-smart-replacing-cache/

111 https://www.aerospike.com/resources/videos/summit19/ty-hpe/


https://www.aerospike.com/blog/how-to-travel-smart-replacing-cache/

https://www.aerospike.com/resources/videos/summit19/ty-hpe/

25

▪ 33% footprint reduction over nearly any other K/V store 112

Predictability is another critical aspect of Aerospike’s performance. The comparative benchmark results that

we discussed earlier yielded compelling information about the latency variances for Aerospike vs. Cassandra.

Comparing the two systems based on their ranges of response times and throughput -- rather than a simple

average, as we did earlier -- reveals how Aerospike offers far greater performance predictability than

Cassandra.

Let’s start with the variances for reads. Figs. 1-2 depict the latency variances and the throughput variances,

respectively. Note how few spikes -- variances -- in read latencies were observed with Aerospike (shown in

red) compared to Cassandra (shown in blue). Not only did Aerospike service reads far more quickly, it did so

with far greater predictability, as Fig. 1 shows. The same situation held true for throughput. As Fig. 2 shows,

Aerospike not only serviced a much higher volume of read requests during the same time period as

Cassandra, it did so with more consistent performance.

Fig. 1: Read latency variances observed for Cassandra and Aerospike

112 https://www.aerospike.com/resources/videos/summit19/company/hpe/

https://www.aerospike.com/resources/videos/summit19/company/hpe/

26

Fig. 2: Read throughput variances observed for Cassandra and Aerospike

What about writes? The situation is quite similar. Fig. 3 illustrates how Aerospike processed writes far more

quickly and with far more consistent performance than Cassandra. Furthermore, Aerospike’s throughput was

much more predictable -- and much higher -- than Cassandra’s, as shown in Fig. 4.

Fig. 3: Write latency variances observed for Cassandra and Aerospike

27

Fig. 4: Write throughput variances observed for Cassandra and Aerospike

Simply put, Aerospike delivers highly consistent response times and throughput to its users. It’s very

predictable, which makes it easy for firms to meet their application SLAs even as their business demands

grow.

Proven Durability and Availability Aerospike uses a shared-nothing architecture. All nodes serve as peers; none has a specific role. Aerospike

is built with a unique master-based cluster algorithm. A single node will be the owner or the primary node for

that partition. If a replication factor is defined, then there will be N number of nodes that have a copy of that

record for reliability. If you lose a node, then you have another copy. And unlike other systems, Aerospike

writes synchronously across all copies of the data.

In Aerospike’s default configuration, when a node fails or is removed from the cluster, any node that has a

secondary copy of the partition can be instantly promoted to be the master of that partition without the typical

delays imposed by a consensus algorithm. Aerospike uses the Paxos algorithm to ensure consensus across

the cluster, but since each node is equal in its role and equal with the current state of the data, any of the

secondary nodes can be promoted. This has allowed an organization like AppNexus to minimize downtime,

as expressed by its CTO, Geir Magnusson:

2.5 million impressions a second at peak, although we can go much higher, and we see north of 90 Billion

impressions per day and this is a 24×7 business with 100% uptime with Aerospike.113

Beyond “fair-weather” performance and availability, you also need a system that can deal with human errors

(e.g., “phat fingering”) and natural disasters (e.g., the weather). Aerospike has the ability to ensure availability

across regions using Cross Datacenter Replication (XDR), an essential feature for many global firms. As

Henry Snow, VP of infrastructure for the Nielsen Marketing Cloud, explained:

113 https://www.aerospike.com/customers/why-appnexus-uses-aerospike/

https://www.aerospike.com/customers/why-appnexus-uses-aerospike/

28

Cross Datacenter Replication is vital to our organization. We need to have redundant data centers. We need

our user objects to be available in multiple facilities. . . . The ability to replicate data across regions is

something that Aerospike provides that very (few) other NoSQL databases do with ease.114

It’s also important to note that Aerospike supports strong, immediate data consistency. This support

precludes data loss. In particular, Aerospike guarantees that all writes to a single record will be applied in a

specific sequential order and writes will not be re-ordered or skipped. Furthermore, Aerospike ensures that

writes acknowledged as having been committed have been applied and exist in the transaction timeline even

if a network or disk failure occurs.

For reads, Aerospike supports both full linearizability, which provides a single linear view among all clients that

can observe data, as well as a more practical session consistency, which guarantees an individual process

sees the sequential set of updates. These policies can be chosen on a read-by-read basis, thus allowing the

few transactions that require a higher guarantee pay the extra synchronization price.

Indeed, Aerospike’s strong consistency, 24x7 availability, and ultra-fast performance were among the

reasons why a major Europoean bank adopted it for managing instant payments. The firm needed a system of

record capable of processing 43 million transactions per day with very low latencies. Of course, preserving

data consistency was critical, even as the application scaled to support up to 1000 European banks. The

ability to replicate data across data centers was also important to the firm’s high availability objects.

Aerospike proved to be the solution that satisfied all their aggressive requirements.

For more information about Aerospike’s strong consistency support, see the white paper on Exploring data

consistency in Aerospike Enterprise Edition115 or read about the Jepsen report.116

Self-Managing Features The true value of complex technology is its simplicity of usage, especially in challenging production

environments. You don’t want to spend time and energy when the cluster expands, when you refresh the

hardware or when you migrate to a new Data Center. You want to use your database infrastructure like a

utility, adding capacity as and when you need to without extensive planning for maintenance windows. That’s

one of the reasons many firms prefer to use Aerospike. Indeed, at a 2019 conference IronSource recently

characterized its daily maintenance with Aerospike as “minimal to none.” As one IronSource staff member put

it,

I barely touch the cluster; it simply works. 117

Aerospike achieves this operational simplicity with self-forming and self-healing clusters that are both rack-

aware and data center-aware. No downtime is required to add or remove nodes from a cluster; Aerospike

automatically re-distributes partitions of data to ensure the new compute resources are efficiently used.

Contrast that with this advice from a Cassandra expert to fellow Cassandra users:

When making major changes to the cluster (expanding, migrating, decommissioning), GO SLOW. 118

Furthermore, Aerospike’s rack awareness also ensures that data is correctly separated to avoid

compounding failures. Using a proprietary algorithm, nodes can be restarted -- for example, to perform a

114 https://www.aerospike.com/resources/videos/nielsen-hurdling-operational-issues/

115 https://www.aerospike.com/lp/exploring-data-consistency-aerospike-enterprise-edition/

116 https://jepsen.io/analyses/aerospike-3-99-0-3

117 https://www.aerospike.com/resources/videos/summit19/company/ironsource/


https://www.aerospike.com/lp/exploring-data-consistency-aerospike-enterprise-edition/


https://www.aerospike.com/blog/aerospike-4-strong-consistency-and-jepsen/

https://www.aerospike.com/resources/videos/nielsen-hurdling-operational-issues/


https://jepsen.io/analyses/aerospike-3-99-0-3

https://www.aerospike.com/resources/videos/summit19/company/ironsource/


29

software upgrade -- but memory contents are preserved, allowing the process to restart without the need to

warm the memory buffers and caches in just seconds, as the data was never evicted from DRAM.

As one technology leader at Cybereason said,

So what surprises us about Aerospike is how easy it is to work with and how easy it is to operate. Using

Aerospike we are able to create a big data solution, persistent and highly available. . . . Aerospike is so easy

to work with. Aerospike does everything automatically.119

Complex technology does not have to be complex to use and operate. Period.

Staffing Hiring and retaining staff is always a critical task in any organization. You need to ensure that the services and

applications you build and deploy meet the time-to-market needs of your organization. Yet people costs

should never be excluded from this calculation. The long-term care and maintenance of your infrastructure is

a real cost that you need to drive down if only to spend your company's energy on unique business functions

rather than database maintenance. From an operational perspective, Aerospike radically simplifies running

and maintaining a distributed database. Ken Bakunas, a data architect at Wayfair, put it simply:

Other groups [in our firm] hate us since nothing goes wrong, no downtime. 120

Indeed, Aerospike’s relatively small server footprint, combined with its self-managing and self-healing

features, means that your staffing needs are likely to be much lower -- and much easier to fulfill -- than you

budgeted for with Cassandra.

Contrast this with the typical advice from the Cassandra community, such as this blog from Threat Stack:

Don’t under staff Cassandra. This is hard as a start up, but recognize going in that it could require 1 to 2 FTEs

as you ramp up, maybe more depending on how quickly you scale up. 121

Summary Although firms often view Cassandra as the obvious choice for their initial NoSQL projects, many find

themselves facing high operational costs and SLA violations as their data volumes and workloads grow.

Unpredictable performance, sprawling server footprints, operational and tuning challenges, and IT budget

overruns are common trouble spots. Indeed, here are five signs that your firm may have outgrown

Cassandra:

1. You’re struggling with ever-larger server footprints and worried about TCO.

2. You’re finding it harder and harder to meet SLAs.

3. You’re facing more and more cascading failures.

4. Your operations team keeps growing — and so does costs.

5. You’re having trouble acquiring and retaining Cassandra expertise.

It doesn’t have to be that way. Aerospike offers a compelling alternative. As many former Cassandra users

have already discovered, Aerospike provides ultra-fast, predictable performance for mixed operational

workloads at scale. And it does so at a fraction of the operational cost of Cassandra and other alternatives.

119 https://www.aerospike.com/resources/videos/cybereason-implementing-high-performance-scalable-

petabyte-cyber-solution/

120 https://www.aerospike.com/resources/videos/summit19/company/wayfair/

121 https://www.threatstack.com/blog/scaling-cassandra-lessons-learned

https://www.aerospike.com/resources/videos/cybereason-implementing-high-performance-scalable-petabyte-cyber-solution/

https://www.aerospike.com/resources/videos/cybereason-implementing-high-performance-scalable-petabyte-cyber-solution/

https://www.aerospike.com/resources/videos/summit19/company/wayfair/

https://www.threatstack.com/blog/scaling-cassandra-lessons-learned

30

Indeed, firms that have migrated from Cassandra to Aerospike routinely enjoy substantial TCO savings thanks

to Aerospike’s highly efficient distributed database technology and its patented hybrid memory architecture.

For example, one company that migrated from Cassandra to Aerospike shrunk its server from 450 to 60

nodes and is projecting a $6.9 million in TCO savings over three years. And they’re not alone.

So why wait? If you’re struggling to meet your business needs with Cassandra, why not explore what

Aerospike can do for you? Firms in financial services, telecommunications, technology, retail, and other

industries have already realized concrete business benefits by deploying Aerospike for their mission-critical

applications. Contact Aerospike to arrange a technical briefing or discuss potential pilot projects. You might

be surprised just how much Aerospike can do for you.

https://www.aerospike.com/forms/contact-us/

31

Appendix A: Comparative benchmark details

In case you’re curious about the YCSB results and TCO described earlier, this details the configuration and

assumptions used for these efforts.

Configuration To update Aerospike’s 2016 comparative benchmark study122, we used the following configuration:

DBMS editions: Aerospike Community Edition 4.6.0.4, Apache Cassandra 3.11.4

Operating system: Centos 7.6 - Linux kernel version - 3.10-693

YCSB: Version 0.12. We briefly tested 0.14 but observed known problems with update operations and the

max_executiontime. Therefore, we chose the more stable version 0.12.f

Java: Version 1.8.0-201

Server Hardware:

3 x Dell R730xd Rack mounted server.

Numa Nodes 2

Model name: Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz

RAM 4 x 32GB DIMM DDR4 Synchronous Registered (Buffered) 2133 MHz

OS Drive Dell 465G HDD

Data Storage 2 x 1.6 TB Micron 9200 Max

Network Intel Ethernet Controller X710 for 1GbE SFP+

Client Hardware:

1 x Dell R730xd Rack mounted server.

Numa Nodes 2

Model name: Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz

RAM 4 x 32GB DIMM DDR4 Synchronous Registered (Buffered) 2133 MHz

OS Drive Dell 465G HDD

Network Intel Ethernet Controller X710 for 1GbE SFP+

As of this writing, the cost per server without memory is $10,842. The cost per 32 GiB DIMM is $671. And the

cost for a 1.6 TB 9200 Max SSD is $1053. These figures are reflected in Table 2 presented earlier in the

paper.

Cassandra TCO and sizing assumptions As described earlier, we needed to scale up Cassandra’s environment to meet the throughput and latencies

of Aerospike that served as the SLA for our comparative TCO scenario. Consequently, we made the following

assumptions for Cassandra: 1. Cassandra scales linearly.

2. Caching drives performance for Cassandra. (We sized for efficient page caching to maximize mixed

read/write performance and reduce the impact of compactions.)

3. The page cache hit ratio is defined by the available memory (64GiB for page cache).

4. The number of unique keys is 2 billion (2 x 109).

5. With a replication factor of 2, the data set contains a total of 4 billion keys.

122 https://www.aerospike.com/blog/comparing-nosql-databases-aerospike-vs-cassandra/

https://www.aerospike.com/blog/comparing-nosql-databases-aerospike-vs-cassandra/

32

6. The total data size from the key set is 4 billion x 1331 = 5.324 terabytes.

7. Three nodes are used to create a scaled version of the cluster.

8. The server profile is fixed.

To calculate the TCO, we based the infrastructure-related costs on industry sources detailing best practices

as they existed at the time of our study (4Q 2019). Here are the additional assumptions that factored into our

TCO calculations:

1. The server cost is based on the benchmark results and sizing. The Cassandra cluster was sized with

3 nodes per server. The full depreciation for servers acquired in year 2 and year 3 occurs in year 4

and 5 which are not shown in the table above.

2. Power cost is based on 15.2¢ per kWh.

3. Space cost is based on $200 per sqft of construction cost depreciated over 39 years. The estimate

for necessary square footage is based on 30 sq ft per rack. Each rack contains 13 servers.

4. Support cost is based on an average salary of a system administrator of $85,706. Each administrator

maintains 120 servers.

5. Cooling power is calculated from 50% of total power of a data center. IT power is runs 36% of power

used in a data center. Using the server power as IT power, an estimate of the total power cost is

estimated. From the total power cost estimate the cooling cost is calculated. These estimates are in

line with other cooling cost models.

https://www.eia.gov/electricity/monthly/epm_table_grapher.php?t=epmt_5_6_a

https://www.it.northwestern.edu/bin/docs/DesignBestPractices_127434.pdf

https://blogs.gartner.com/david_cappuccio/2012/06/07/the-case-for-the-infinite-data-center/

https://www.indeed.com/salaries/systems-administrator-Salaries

https://verber.com/it-staffing/

33

About Aerospike

Aerospike is the global leader in next-generation, real-time NoSQL data solutions for any scale. Aerospike

enterprises overcome seemingly impossible data bottlenecks to compete and win with a fraction of the

infrastructure complexity and cost of legacy NoSQL databases. Aerospike’s patented Hybrid Memory

Architecture™ delivers an unbreakable competitive advantage by unlocking the full potential of modern

hardware, delivering previously unimaginable value from vast amounts of data at the edge, to the core and in

the cloud. Aerospike empowers customers to instantly fight fraud; dramatically increase shopping cart size;

deploy global digital payment networks; and deliver instant, one-to-one personalization for millions of

customers. Aerospike customers include Airtel, Baidu, Banca d’Italia, Nielsen, PayPal, Snap, Verizon Media

and Wayfair. The company is headquartered in Mountain View, Calif., with additional locations in London;

Bengaluru, India; and Or Yehuda, Israel.

© 2019 Aerospike, Inc. All rights reserved. Aerospike and the Aerospike logo are trademarks or registered

trademarks of Aerospike. All other names and trademarks are for identification purposes and are the property

of their respective owners.

2525 E. Charleston Road, Suite 201

Mountain View, CA 94043

Tel: 408 462 2376

www.aerospike.com

Documents

Five Signs You Have Outgrown Cassandra