Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
2
Contents
Executive Summary......................................................................................................................................... 3
Five Signs ....................................................................................................................................................... 4
Exploring the Five Signs in Depth .................................................................................................................... 4
Sign #1: You’re struggling with “server sprawl” and worried about TCO .......................................................... 4
Sign #2: Peak loads are disrupting SLAs ......................................................................................................... 7
Sign #3: You’re tired of living with cascading failures ..................................................................................... 10
Sign #4: Your operations team keeps growing and you’re worried about costs ............................................. 14
Sign #5: You’re struggling to acquire and retain Cassandra expertise ........................................................... 16
Aerospike: The Enterprise-Grade NoSQL Database ...................................................................................... 19
Reduced TCO ........................................................................................................................................... 19
Ultra Fast, Predictable Performance at Scale............................................................................................. 24
Proven Durability and Availability ............................................................................................................... 27
Self-Managing Features ............................................................................................................................. 28
Staffing ...................................................................................................................................................... 29
Summary ................................................................................................................................................... 29
Appendix A: Comparative benchmark details ................................................................................................ 31
Configuration ............................................................................................................................................. 31
Cassandra TCO and sizing assumptions ................................................................................................... 31
About Aerospike ........................................................................................................................................... 33
3
Executive Summary
Many firms find Cassandra to be an obvious choice for their initial NoSQL projects. After all, it’s a popular
open source project of the Apache Software Foundation, and several commercial offerings are available.
Getting started usually isn’t too difficult, either. But as your needs evolve, you may find it hard to achieve your
goals with Cassandra. Indeed, you may face high operational costs and service-level agreement (SLA)
violations as you attempt to manage growing data volumes, cope with more complex workloads and deliver
lower and lower data access latencies. This can impact Line of Business (LOB) budgets, operational stability
and ultimately, your customer’s experience. As a result, you may find that your Cassandra infrastructure
hinders your organization's ability to be agile, to compete and to bring new products and services to market.
Aerospike has helped many firms overcome such challenges. Like Cassandra, Aerospike is a NoSQL key-
value database system. Unlike Cassandra, Aerospike features a patented Hybrid Memory Architecture (HMA)
that delivers exceptional runtime performance for mixed read/write workloads at scale, often at a savings of
60% or more in operational expenses (opex) when compared to Cassandra and other alternatives. High
availability, strong data consistency and self-management features are just a few additional reasons why
Cassandra users frequently find Aerospike so appealing. Indeed, firms in financial services,
telecommunications, technology, retail and other industries now rely on Aerospike to support mission-critical
operational applications that require ultra-fast, predictable access to billions of records in databases of 10s-
100s of TBs or more.
Skeptical? Don’t take our word for it. Listen to our customers. Consider the experience of an identity
resolution firm1 that replaced its open source Cassandra environment with Aerospike; the firm reduced its
server footprint from 450 to 60 nodes and expects to save $6.9 million in TCO over three years. Explore how
a mobile marketing and advertising platform provider2 using Cassandra turned to Aerospike to minimize the
work necessary to achieve its SLAs. Learn how an online retailer moved from Cassandra to Aerospike after
experiencing latency issues, substantial operational overhead, and difficulty hiring Cassandra experts; with
Aerospike, they decreased their server footprint from 60 to 7 nodes and lowered their cost per usable TB.3
Perhaps you’re wondering how that’s possible. You’ll have a chance to learn more about Aerospike shortly.
But for now, let’s explore the main topic of this paper: five tell-tale signs that you may have outgrown
Cassandra.
1 https://www.aerospike.com/lp/debunking-free-open-source-myth/
2 https://aero-media.aerospike.com/2018/10/InMobi-Case-Study.pdf
3 https://youtu.be/IjWnG8dWTms?t=255
4
Five Signs
What are the five signs you may have outgrown Cassandra?
Sign #1: You’re struggling with “server sprawl” and worried about TCO
● Are growing business demands driving ever-larger clusters?
● Can your budget accommodate the resulting cost increases?
● Does TCO concern you? Do you need to stretch your IT budget further?
Sign #2: Peak loads are disrupting SLAs
● Are you missing application service-level agreements (SLAs)? Is this impacting your bottom line?
● Are you provisioning more hardware or caches to meet your SLAs?
Sign #3: You need persistence with high performance
● Are you experiencing memory pressure or other computing resource pressures?
● Are you fighting compactions?
Sign #4: Your operations team keeps growing and you’re concerned about costs
● Are you fighting garbage collection, tombstones, and JVM tuning?
● Do you have to re-tune for each new workload, major software release, or hardware refresh?
Sign #5: You’re struggling to acquire and retain Cassandra expertise
● Can you find people with the right skills and experience to operate your clusters?
● Can you find people capable of committing changes back to Apache?
● Can you find people who truly understand data modeling in Cassandra?
Exploring the Five Signs in Depth
If your organization is like most, you’re probably facing demands to deliver new applications faster, reduce
costs, provide a reliable and engaging customer experience, drive digital transformations, and more. How do
these map to the signs that you have outgrown your Cassandra cluster?
Sign #1: You’re struggling with “server sprawl” and worried about
TCO
It's a dirty little secret: the NoSQL community and vendors have encouraged you to build big -- really big --
database clusters. It’s become a matter of honor to be running thousands of nodes (and, in at least one firm’s
case, 100,000 nodes4). But what are the consequences of such expansive clusters? In a large cluster, the
probability and incidence of having a node fail goes from a mere possibility to a frequent occurrence. Such
environments employ dozens of hard drives or SSDs per node over hundreds or thousands of nodes.
Vendors benefit from your big clusters. Ever notice how they price by the node? There’s no incent ive for them
to minimize cluster size; in fact, they’re motivated to do quite the opposite.
4 https://en.wikipedia.org/wiki/Apache_Cassandra#cite_note-36
5
The larger the cluster, the more components you have; hence, hardware failures may occur weekly, daily, or
even hourly. And such failures can lead to performance problems, SLA violations, and even data loss, as
you’ll see in the next section. By contrast, smaller clusters mean fewer components and, hence, less frequent
failures.
Cassandra does a great job of horizontal scaling: you simply add more nodes. But can you fully utilize each
node before you need to buy, provision and manage another . . . and another . . . and another? You know the
answer: Cassandra can’t fully utilize a database node. Thus, when you hit a resource limit -- CPU, storage
IOPs, or DRAM for the JVM heap -- your only alternative is to scale out. Yet each node you add creates more
complexity. Each node you add also results in greater cost because of Cassandra vendors’ node-based
pricing model. And that’s not all. As Cassandra renders you unable to utilize the performance of your servers
in their entirety, you may be forced to take extreme measures. As one Cassandra expert noted,
Cassandra is a beast to run, taking considerable resources to function at peak performance. To help, learn
how you can run multiple Cassandra clusters on the same hosts.5
Indeed, running multiple Cassandra instances per box is so common that DataStax, a Cassandra vendor,
created a Multi-Instance feature as part of its Enterprise version to automate this deployment topology.
However, running multiple instances per server just adds further operational complexity and compounds
failure modes when a server goes down.
Why is Cassandra so inefficient with compute resources? The use of Java -- with multiple JVM providers and
garbage collection (GC) strategies (e.g., ParNew, CMS, G1GC) -- creates many variables that need to be
optimized and tuned. As one Cassandra consultant put it:
“. . . (W)hy bother tuning the JVM? Don’t the defaults work well enough out of the box? Unfortunately, if your
team cares at all about meeting an SLA, keeping costs to a minimum, or simply getting decent performance
out of Cassandra, it’s absolutely necessary.” 6
Furthermore, tuning the Java heap isn’t always straightforward. You can’t just continue to increase the heap
size to solve performance problems. After the optimal heap size is exceeded (a condition that will vary
depending on your hardware configuration and workload), you’ll face increased garbage collection and risk
lengthy “stop the world” operations. One detailed presentation7 on tuning JVMs and garbage collection
strategies with Cassandra concluded:
Super easy to get things wrong. Change 1 param at a time & test for > 1 day . . . . Don’t trust what you’ve
read! Check your GC logs!
To get the most out of a node, you need to carefully read the logs, adjust JVM parameters, debug, look at
thread dumps, monitor compactions, etc. Naturally, you need to do so for each workload and cluster
configuration, especially if the hardware is different. And when you upgrade the hardware on your existing
cluster? Yes, you need to tune all over again. What if you add a new workload? You’ve guessed it -- you need
to re-tune.
5 https://dzone.com/articles/how-to-run-multiple-cassandra-nodes-on-the-same-ho
6 https://thelastpickle.com/blog/2018/04/11/gc-tuning.html
7 https://www.slideshare.net/QuentinAmbard/jvm-tuning-for-low-latency-application-cassandra-95381363
6
This tuning is difficult and requires detailed knowledge spread across user blogs8 9, presentations10, and
product documentation.11 12 Most operations teams find it easier to expand the cluster using the same
configurations and hardware profile. It’s a practical -- though costly -- approach for the Line of Business
owner, IT budget owner, or whoever has to pay the bill.
Think back on how you initially sized your cluster. You probably followed the best practices from numerous
community blogs, the Apache wiki, or documentation from one or more of the vendors. Hopefully, you took
into account the additional storage space needed to support your compaction strategy. You may have taken
into account space for snapshots. Were you later surprised when the application was deployed and used a
huge amount of additional space? This is where you needed to know with precision which features the
application team used when the application was constructed, as noted by one community user:
“. . . (When) we attempted to use a CQL Map to store analytics data, we saw 30X data size overhead vs.
using a simpler storage format and Cassandra’s old storage format, now called COMPACT STORAGE. Ah,
that’s where the name comes from: COMPACT, as in small, lightweight. Put another way, Cassandra and
CQL’s new default storage format is NOT COMPACT, that is, large and heavyweight.”13
Finally, your compaction and backup strategy can also have a huge impact on your capital expenditures
(capex). Because you rely on compaction to reduce storage requirements, you may end up backing up older
generations of the data again and again until the compactions can catch up. As noted in one DZone article,
“. . . (C)ompaction can cause a multi-fold increase in secondary storage requirements thus leading to a huge
CAPEX burden on customers. Case in point: . . . secondary storage was as high as 12 times the primary
storage for level compaction.”14
A different article explained,
Running low on disk space has always been problematic with Cassandra. . . . With compaction, we need
enough space while we’re performing a compaction for the original files as well as the new ones. This can add
up to a significant amount on very dense nodes.15
Indeed, increasingly high TCO is one of the most common reasons why Cassandra users turn to Aerospike.
Consider this conclusion from a former Cassandra user that needed to reduce its operational footprint and
infrastructure expenditures:
Compared to the other solutions that we were evaluating, the main drivers that made Aerospike so attractive
was its total cost of ownership performance and scale were all superior . . . . 16
Just how superior? By migrating from Cassandra to Aerospike, this firm cut its server footprint from 450 to 60
nodes and expects to save $6.9 million in TCO over three years.
8 https://thelastpickle.com/blog/2018/08/08/compression_performance.html
9 https://medium.com/linagora-engineering/tunning-cassandra-performances-7d8fa31627e3
10 https://www.slideshare.net/DataStax/lessons-learned-on-java-tuning-for-our-cassandra-clusters-carlos-
monroy-knewton-c-summit-2016
11 https://docs.datastax.com/en/ddac/doc/datastax_enterprise/operations/opsTuneJVM.html
12 https://support.datastax.com/hc/en-us/articles/204226009-Taking-Thread-dumps-to-Troubleshoot-High-
CPU-Utilization
13 https://blog.parse.ly/post/1928/cass/
14 https://dzone.com/articles/backup-and-restore-challenges-cassandra-compaction
15 https://thelastpickle.com/blog/2017/03/16/compaction-nuance.html
16 https://www.aerospike.com/lp/debunking-free-open-source-myth/
7
Sign #2: Peak loads are disrupting SLAs
Ingesting huge amounts of data can be critical to many applications. However, as the data is written,
applications often need to read and modify the same data; such mixed workloads constitute the normal
pattern for applications like activity streams, profile stores, trade stores, etc. Write-once workloads, where the
data is never modified, like log streams, are not the norm. Yet achieving strong, consistent performance when
running mixed workloads on Cassandra can be elusive, as one analyst noted:
Latency spikes with mixed workloads at scale. Cassandra handles ingest, but if you do [ingest operations]
with read and analytics, it spikes.17
If mixed reads and writes are so common, why is this pattern such a problem for Cassandra? Quite simply,
it’s due to choices made by the designers of Cassandra. For example, because Cassandra is written in Java,
users need to become proficient in tuning JVMs, garbage collectors, and other Java-unique artifacts. As one
Cassandra expert noted in a conference presentation,
[Compactions] can have . . . significant disk, memory (GC), cpu, IO overhead. [They] are often the cause of
‘unexplained’ latency or IO issues in the cluster.18
Beyond its Java code base, Cassandra’s storage model also impacts its runtime performance for mixed
workloads. Cassandra employs a log-structured merge tree (LSM tree) for storage; these structures require
certain maintenance operations (e.g., compactions) and often introduce additional latencies, particularly for
workloads that require a mix of read and write operations. As one article explained:
Because of the maintenance, LSM-Trees might result into worse latency, since both CPU and IO bandwidth is
spent re-reading and merging tables instead of just serving reads and writes. It’s also possible, under a write-
heavy workload, to saturate IO by just writes and flushes, stalling the compaction process. Lagging
compaction results into slower reads, increasing CPU and IO pressure, making . . . matters worse. This is
something to watch out for. 19
Furthermore, Cassandra’s approach to data consistency requires careful thought and application design. Its
“tunable consistency” requires programmers to understand 10 consistency level settings and select what’s
appropriate for their operational needs, taking availability, latency, throughput, and data correctness into
account. Various articles attempt to explain the options20 21 22 23, although some raise concerns about critical
issues, such as whether Cassandra truly supports linearizable reads and strong guarantees against data
loss.24 25 But perhaps most significant is that Cassandra has yet to pass the Jepsen tests26, which measure
the safety of distributed systems. Aerospike, by contrast, not only passed these tests but demonstrated that
its support for strong, immediate data consistency doesn’t compromise its exceptional runtime speeds.27
17 https://dzone.com/articles/database-concerns-1
18 https://www.slideshare.net/DataStax/lessons-learned-from-running-1800-clusters
19 https://medium.com/databasss/on-disk-io-part-3-lsm-trees-8b2da218496f
20 https://medium.com/dugglabs/data-consistency-in-apache-cassandra-part-1-7aee6b472fb4
21 https://medium.com/dugglabs/data-consistency-in-apache-cassandra-part-2-e673a3cfea32
22 https://medium.com/dugglabs/data-consistency-in-apache-cassandra-part-3-623b04cbf137
23 https://blog.yugabyte.com/apache-cassandra-lightweight-transactions-secondary-indexes-tunable-
consistency/
24 https://blog.yugabyte.com/apache-cassandra-lightweight-transactions-secondary-indexes-tunable-
consistency/
25 https://news.ycombinator.com/item?id=13592269
26 https://jepsen.io
27 https://www.aerospike.com/blog/aerospike-4-strong-consistency-and-jepsen/
8
Cassandra developers often try to balance the need for consistency and high performance by using
consistency settings28 that at least get a quorum across the copies held across the nodes of the clusters for
reads and writes. This adds unpredictable latency to any operation, as the operation can only be as fast as
the slowest node. The most common solution is to cache more of the data to avoid disk reads; this leads to
larger clusters and more DRAM, driving up ownership costs and violating many of the tuning guidelines
regarding the size of the JVM heap.
As we mentioned in the previous section, Cassandra users often find themselves forced to cope with very
large server footprints as their data volumes grow. And such large clusters can increase the probability that
you’ll suffer from data loss, even if you have a data replication of 3 (a common Cassandra configuration in
which 3 separate copies of the data are spread across the cluster). Consider this analysis from distributed
systems researcher Martin Kleppmann:
The more nodes and disks you have in your cluster, the more likely it is that you lose data. This is a counter-
intuitive idea. . . . But I calculated the probabilities and drew a graph, and it looked like this:
To be clear, this isn’t the probability of a single node failing – this is the probability of permanently losing all
three replicas of some piece of data, so restoring from backup (if you have one) is the only remaining way to
recover that data. . . . The probability that all three replicas are dead nodes depends crucially on the
algorithm that the system uses to assign data to replicas. The graph above is calculated under the
assumption that the data is split into a number of partitions (shards), and that each partition is stored on three
randomly chosen nodes (or pseudo-randomly with a hash function). This is the case with consistent hashing,
used in Cassandra and Riak, among others (as far as I know). . . .
If the probability of permanently losing data in a 10,000 node cluster is really 0.25% per day, that would mean
a 60% chance of losing data in a year.29
But the possibility of data loss isn’t the only challenge you may face for your production applications,
particularly if you’re running mixed workloads. Consider how Cassandra processes data access requests.
Any node in the cluster can accept any read or write request even if it doesn’t manage the data involved. As
one Cassandra vendor explained,
28 https://docs.datastax.com/en/ddac/doc/datastax_enterprise/dbInternals/dbIntConfigConsistency.html
29 https://martin.kleppmann.com/2017/01/26/data-loss-in-large-clusters.html
9
Client read or write requests can be sent to any node in the cluster. When a client connects to a node with a
request, that node serves as the coordinator for that particular client operation. The coordinator acts as a
proxy between the client application and the nodes that own the data being requested. The coordinator
determines which nodes in the ring should get the request based on how the cluster is configured.30
This design causes extra network traffic. And excess network traffic impedes performance. Indeed, even
healthy clusters routinely undertake added network hops to service the simplest of requests, which can lead
to a wide variance in read latencies. Choosing a sensible partition key is only a partial solution: it simply limits
the number of nodes that must be checked rather than eliminating the need in the first place.
During any situation where nodes become unavailable, the coordinator node suffers further overhead31 since
it needs to keep track of any hinted handoff for writes that will need to be re-applied later. This can lead to
instability throughout the cluster, as we will see in the next sign.
Compactions add another layer of complexity. In any LSM tree system, you need to periodically prune and
compress the trees32, removing older and redundant versions of the data and cleaning tombstones (deleted
records). Cassandra has both major and minor compactions33, and tuning for these compactions is a popular
topic among the user community. In one blog, a Cassandra specialist noted that
Understanding how a compaction strategy complements your data model can have a significant impact on
your application’s performance.34
Cassandra users must choose between multiple compaction strategies, each of which is designed for certain
workloads. But understanding the intricacies of these strategies doesn’t guarantee success. As another
Cassandra consultant explained,
Unfortunately, there are many workloads which don’t necessarily fit well into any of the current compaction
strategies.35
Why do users worry about compactions? During the time compactions run, they adversely affect the read and
write latency and throughput of operations, which impacts your SLAs (again).
To deal with peak workloads, you could expand your Cassandra cluster at regular intervals or on specific
occasions (e.g., in anticipation of a product promotion). But don’t wait until the last minute to expand your
cluster or you might run into some unpleasant surprises. One blog detailed some common “gotchas” with
bootstrapping Cassandra nodes to expand a cluster, including data inconsistencies:
There a number of knobs and levers for controlling the default behaviour of bootstrapping. As demonstrated
by setting the JVM option cassandra.consistent.rangemovement=false the cluster runs the risk of over
streaming of data and worse still, it can violate consistency. For new users to Cassandra, the safest way to
add multiple nodes into a cluster is to add them one at a time.36
30 https://docs.datastax.com/en/ddac/doc/datastax_enterprise/dbArch/archIntro.html
31 https://thelastpickle.com/blog/2018/03/21/hinted-handoff-gc-grace-demystified.html
32 https://medium.com/databasss/on-disk-io-part-3-lsm-trees-8b2da218496f
33 http://cassandra.apache.org/doc/latest/operating/compaction.html
34 https://thelastpickle.com/blog/2017/03/16/compaction-nuance.html
35 https://blog.pythian.com/proposal-for-a-new-cassandra-cluster-key-compaction-strategy/
36 https://thelastpickle.com/blog/2017/05/23/auto-bootstrapping-part1.html#gotchas
10
Indeed, in a recent conference presentation37, one former Cassandra user from The Trade Desk explained
how his team found itself faced with “cascading performance issues due to CPU consumption” caused by
compaction, compression, garbage collection (“even with optimized versions of GC”) and other factors. This
created a need for “a TON of CPU for a relatively small amount of data,” which translated to more machines.
Ultimately, his group opted to use Aerospike instead of Cassandra, which enabled them to reduce their
cluster size from 512 Cassandra nodes on AWS to 60 Aerospike nodes on AWS, improve performance and
availability, and reduce TCO.
Sign #3: You’re tired of living with cascading failures
For mission-critical systems, availability is the most crucial aspect of a data layer. After all, isn’t availability
why you picked AP from the CAP theorem38 and selected Apache Cassandra in the first place? You’ve
chosen a system that has distribution and replication of data, so when a node becomes unavailable --
momentarily or permanently -- the data lives elsewhere. Right?
Wrong, actually. Data distribution sounds well and good (and is the correct solution), but if a single node
outage causes a cascading failure across your cluster, every node becomes the single point of failure for the
cluster. And cascading failures are common.39 As one Cassandra user noted,
Cassandra seems to have two modes: fine and catastrophic.40
What, then, are some typical sources of failures? They include:
● Memory pressure caused by hinted handoff during failover
● Compactions trashing the cache
● Compactions causing I/O, and thus, CPU pressure
● Memory use causing frequent garbage collection, and thus, CPU pressure
● A large number of tables causing memory pressure
● Inappropriately sized partitions causing pressures for the Java heap and garbage collection
Let’s explore each in turn.
Memory pressure caused by hinted handoff during failover - As noted by one Cassandra vendor in its
documentation, the cause of this is clear. If you rely on Cassandra’s ability to store writes on a coordinator
node to replay later when the designated node returns to the cluster, this can consume large quantities of
memory and potentially result in failures:
If several nodes experience brief outages simultaneously, substantial memory pressure can build up on the
coordinator. The coordinator tracks how many hints it is currently writing. If the number of hints increases too
much, the coordinator refuses writes and throws the OverloadedException error.41
37 http://pages.aerospike.com/rs/229-XUE-318/images/The-Trade-Desk1-Aerospike-Summit_19.pdf
38 https://en.wikipedia.org/wiki/CAP_theorem
39 https://stackoverflow.com/questions/38249206/simultaneous-repairs-cause-repair-to-hang
40 https://www.slideshare.net/planetcassandra/pd-melting-cass/12?src=clipshare
41 https://docs.datastax.com/en/dse/6.7/dse-
arch/datastax_enterprise/dbArch/archRepairNodesHintedHandoff.html
11
Cassandra users often struggle to understand the intricacies of using hinted hand-offs, leading to questions
about intermittent failures42, data restoration issues43, and data consistency44. As noted in one blog from a
Cassandra consultant,
There are many knobs to turn in Apache Cassandra. Finding the right value for all of them is hard. Yet even
with all values finely tuned unexpected things happen. In this post we will see how gc_grace_seconds can
break the promises of the Hinted Handoff.45
Another consulting firm published a Cassandra performance overview guide that included this warning:
The hinted handoff process can overload the coordinator node. If this happens, the coordinator will refuse
writes, which can result in the loss of some data replicas.46
Perhaps it’s not surprising, then, that some users recommend against using the feature:
Don’t use hinted handoffs (ANY or LOCAL_ANY quorum). In fact, just disable them in the configuration. It’s
too easy to lose data during a prolonged outage or load spike, and if a node went down because of the load
spike you’re just going to pass the problem around the ring, eventually taking multiple or all nodes down.47
Compactions trashing the cache - As noted by one Cassandra vendor, compactions, which increase during a
single node outage as greater load is applied on surviving nodes, also increase pressure on the I/O, memory,
and CPU of those nodes:
Apache Cassandra row cache caches only the head of a partition, where the number of rows cached is
configurable. . . . Because Cassandra row cache is ineffective for many workloads, some users introduce the
additional complexity of running with the row cache disabled, and rely on the operating system page cache
for serving reads. . . . Apache Cassandra compaction thrashes the page cache, because it reads and writes
everything, and after compaction the most frequently used data is likely to no longer be in the cache.48
A Cassandra cluster is often sized using assumptions about the effectiveness of the row cache; an ineffective
row cache leads to a greater number of connections and transactions in flight. This causes difficulty (at the
very least, performance issues) for surviving nodes. However, the effects of the cache are negated when the
blocks being cached are paged out by another database feature.
Compactions causing I/O, and thus, CPU pressure - Choosing between different compaction strategies can
dramatically influence I/O and CPU pressure. This is a function of the reads and writes at a given moment: a
predominantly read-heavy application will get the adverse effects of a data load job, causing pressure on both
I/O and CPU. Certain articles49 50 and conference presentations51 are dedicated to helping Cassandra users
understand the subtleties -- and potential performance pitfalls -- of various compaction strategies. As one
presenter who benchmarked size-tiered and leveled compaction strategies for a write-only workload noted,
42 https://stackoverflow.com/questions/48664110/why-does-hinted-handoff-not-work-sometimes
43 https://dba.stackexchange.com/questions/174889/help-in-understanding-hinted-handoffs-and-data-
replication-in-cassandra
44 https://stackoverflow.com/questions/43445671/whats-the-point-of-using-hinted-handoff-in-cassandra-
especially-for-consistenc
45 https://thelastpickle.com/blog/2018/03/21/hinted-handoff-gc-grace-demystified.html
46 https://www.scnsoft.com/blog/cassandra-performance
47 https://www.threatstack.com/blog/scaling-cassandra-lessons-learned
48 https://www.scylladb.com/product/technology/memory-management/
49 https://thelastpickle.com/blog/2017/03/16/compaction-nuance.html
50 https://thelastpickle.com/blog/2016/12/08/TWCS-part1.html
51 https://www.scylladb.com/tech-talk/ruin-performance-choosing-wrong-compaction-strategy-scylla-summit-
2017/
12
. . . [With] the two existing compaction strategies which I mentioned, we find ourselves between a rock and a
hard place. Both of them give us problems. So for size-tiered compaction, . . . at some point we need twice
the disk space. This is a space amplification problem. For level-tiered compaction, we’ll do more than double
the disk I/O than size-tiered compaction.52
Another presentation on “lessons learned” from running a large number of Cassandra clusters included this
observation:
A single node doing compactions can cause latency issues across the whole cluster, as it will become slow to
respond to queries.53
Garbage collection occurring frequently, creating resource pressures - With a Java code base, garbage
collection is inevitable and uncontrollable. Tuning is possible, but it’s often described as a “dark art.” To
develop skills in this area, your staff may need to read through a number of detailed papers54 55 56 that cover
seemingly arcane topics such as old/tenured generation, young generation (with Eden and survivor heaps),
multi-phase concurrent marking cycle (MPCMC), etc. Maybe that’s not how you’d like your staff to spend
their time.
Unfortunately, you can’t just stick your head in the sand when it comes to garbage collection because its side
effects are real and potentially devastating. Consider this user’s experience:
We have been dealing a lot with garbage collector stop the world event, which can take up to 50 seconds in
our nodes, in the meantime Cassandra Node is unresponsive, not even accepting new logins.57
One Cassandra consultant cautioned:
Full GC pauses are awful, they can lock up the JVM for several minutes, more than enough time for a node to
appear as offline. We want to do our best to avoid Full GC pauses. In many cases it’s faster to reboot a node
than let it finish a pause lasting several minutes.58
Careful JVM tuning is one recommended approach59 60 because relying on defaults simply doesn’t cut it. As
one consultant noted,
The default Cassandra JVM argument is, for the most part, unoptimized for virtually every workload.61
But even that may not be sufficient. So what else might you try? How about rewriting your application?
If your application’s object creation rate is very high, then to keep up with it, the garbage collection rate will
also be very high. A high garbage collection rate will increase the GC pause time as well. Thus, optimizing the
52 https://youtu.be/MLbQXQ-c2vY?t=1356
53 https://www.slideshare.net/DataStax/lessons-learned-from-running-1800-clusters
54 https://www.azul.com/files/Understanding_Java_Garbage_Collection_v4.pdf
55 https://product.hubspot.com/blog/g1gc-fundamentals-lessons-from-taming-garbage-collection
56 https://thelastpickle.com/blog/2018/04/11/gc-tuning.html
57 https://stackoverflow.com/questions/46568777/cassandra-and-g1-garbage-collector-stop-the-world-event-
stw
58 https://thelastpickle.com/blog/2018/04/11/gc-tuning.html
59 https://thelastpickle.com/blog/2018/04/11/gc-tuning.html
60 https://dzone.com/articles/how-to-reduce-long-gc-pause
61 https://thelastpickle.com/blog/2019/01/30/new-cluster-recommendations.html
13
application to create fewer objects is THE EFFECTIVE strategy to reduce long GC pauses. This might be a
time-consuming exercise . . . . 62
Later, the same author further observes that users may also need to increase RAM, decrease the number of
other running processes (to free up RAM), minimize I/O from other processes, or possibly increase the
number of threads allocated to garbage collection, although the latter is tricky because “(a)dding too many
GC threads will consume a lot of CPU and take away resources from your application.” 63
A large number of tables, causing memory pressure - The way in which you construct the data schema can
also impact memory pressure, and changes to application design and use can radically change hardware
requirements.
While there are legitimate use cases for having lots of tables in Cassandra, they are rare. . . . Many tables will
require more resources, obviously. How much? That depends on the settings, and the usage. For example, if
you have a thousand tables and write to all of them at the same time there will be contention for RAM since
there will be memtables for each of them, and there is a certain overhead for each memtable (how much
depends on which version of Cassandra, your settings, etc.).64
Inappropriately sized partitions causing Java heap and garbage collection pressures - Determining the
optimal partition size for your data and workload can be challenging. Recommendations on maximum size
vary depending on the Cassandra release involved; users often target less than 100MB65 66 or, in some
cases, less than 400MB.67 Since partition size is related to the number of columns per row, this impacts your
table design, often imposing restrictions you might prefer to avoid. But going “too wide” can impose its own
challenges:
Wide partitions in Cassandra can put tremendous pressure on the java heap and garbage collector, impact
read latencies, and cause issues ranging from load shedding and dropped messages to crashed and downed
node. 68
While recent changes in Cassandra 3.11 were designed to mitigate these issues, how to size your partitions
isn’t entirely straightforward. As one Cassandra code contributor noted:
After #11206 [a Cassandra improvement for supporting large partitions], what’s the recommended partition
size? It still depends - sorry. IMO, we moved the ‘barrier’ [but] test with your data model and workload.69
Or, as another Cassandra specialist said,
. . . (T)here are practical limitations tied to the impacts large partitions have on the JVM and
read times. These practical limitations are constantly increasing from version to version. This
62 https://dzone.com/articles/how-to-reduce-long-gc-pause
63 https://dzone.com/articles/how-to-reduce-long-gc-pause
64 https://stackoverflow.com/questions/33212414/cassandra-what-is-the-reasonable-maximum-number-of-
tables
65 https://medium.com/@johnny_width/calculating-cassandra-partition-disk-size-43276d1dcb34
66 https://www.slideshare.net/DataStax/lessons-learned-from-running-1800-clusters
67 https://www.backblaze.com/blog/wide-partitions-in-apache-cassandra-3-11/
68 https://www.backblaze.com/blog/wide-partitions-in-apache-cassandra-3-11/
69 https://www.slideshare.net/DataStax/myths-of-big-partitions-robert-stupp-datastax-cassandra-summit-
2016
14
practical limitation is not fixed but variable with data model, query patterns, heap size, and
configurations which makes it hard to . . . give a straight answer on what’s too large.70
Partition sizing is somewhat related to the server sprawl, too. Maintaining a small partition size (say, 100 - 400
MB) provides one way to cope with Java heap pressures, as the key cache resides in the heap. A smaller
amount of data in a partition allows a larger number of keys to reside in the key cache; this reduces the
likelihood of Java heap thrashing and triggering high levels of garbage collection. But small partition sizes can
force you to add more and more nodes as your data volumes grow, contributing to server sprawl and high
TCO.
So what’s a user to do? If you’re tired of trying to prevent one failure after another -- or, worse yet, tired of
fixing one failure only to encounter another -- it’s a good time to do what other Cassandra users have already
done: evaluate your options. As you’ll soon learn, Aerospike’s architecture and design characteristics avoid
the types of problems we’ve just described. Customer experiences prove that. But before we turn to
Aerospike, let’s explore the fourth sign that you may have outgrown Cassandra.
Sign #4: Your operations team keeps growing and you’re worried
about costs
The number of aspects that an operational team must consider at cluster provisioning time is large and
makes the process time-consuming. As a data architect at Wayfair explained,
. . . (W)ith Cassandra there’s a lot more configuration and tuning out of the box. Aerospike? Pretty much
change a few things and you’re good to go.71
Moreover, with Cassandra your operations team must continually monitor clusters for changes in application
patterns and re-tune frequently. Failure to do so leads not only to poor performance but also (eventually) to
CPU pressure, which limits garbage collection capabilities; this causes memory pressure, and in turn,
outages.
In order to increase utilization, clusters are forced to get wider. Consider the conclusion reached by the
director of data engineering at The Trade Desk, a firm that chose Aerospike over Cassandra for several
critical applications:
. . . (I)n order to get the throughput that we needed [with Cassandra], we needed to scale the number of
machines to a high number of machines with a lot of CPU compared to the disk they had. Aerospike gave us
another alternative . . . .72
Managing very large clusters increases operational complexity, requiring more time from your operations staff
to plan and deploy as well as more time to diagnose and fix increasingly complex and compounding failures.
The same situation is true for firms that run multiple Cassandra nodes on the same compute node to achieve
their goals. They’re faced with more operational complexity to plan, deploy, and maintain the environment;
failures naturally become more difficult to diagnose and fix, too.
70 https://stackoverflow.com/questions/46272571/why-is-it-so-bad-to-have-large-partitions-in-cassandra
71 https://youtu.be/IjWnG8dWTms?t=203
72 https://www.aerospike.com/resources/videos/thetradedesk-integrated-hyper-scale-hot-and-cold-store/
15
The complexity of managing Cassandra when faced with demanding and growing workloads hasn’t been lost
on customers. And the business impact of such operational overhead can be substantial. As the CTO of
Signal explained,
Before Aerospike, we were spending more and more of our time on the care and feeding of Cassandra and
less and less time on the building of new product offerings. With Aerospike, we’ve now cleared the roadmap
and we’re just focused on adding new functionality to our platform for our customers.73
Signal’s experience wasn’t an isolated case, either. Performance tuning with Cassandra is an ongoing
challenge that operational teams must confront. Consider the conclusions reached by In-Mobi’s CTO and co-
founder after comparing Cassandra and Aerospike:
Cassandra required extensive database tuning in order to get performance and meet our SLA. We had to
reduce overhead and risk of errors and downtime . . . . Aerospike was the best choice across the board.
Aerospike was easy to deploy, with out-of-the-box functionality, and it’s been easy to use every step of the
way. Operational overhead was nearly zero, and we shaved three months off our launch schedule as a
result.74
Another engineer with Cassandra experience put it this way:
. . . you can't run Cassandra with almost any of its default settings (maybe besides ports). In most tools you'll
need to tweak a few defaults, with Cassandra you need to thoroughly read the docs on every little
configuration in the server and schema definitions (especially if you're working in multiple DCs), you'll always
find a little surprise if you skim through it.
It's also almost mandatory to read the internal design docs of cassandra even if you're not the admin but just
working with it. And modelling data is a lot less trivial than it looks - and almost always not what you assume.75
Indeed, data modeling is a well-recognized challenge in the Cassandra community: As one DataStax blog
noted:
One of the hardest parts of using Cassandra is picking the right data model.76
Designing both the logical and physical data models correctly is critical to a successful deployment. Get one
or both of them wrong, and you may find yourself facing performance, scalability, and availability issues. Why?
As one Cassandra advocate noted at the beginning of a detailed talk on data modeling,
The data model is the only thing you can’t change once in production . . . . Well, not easily, anyway. 77
So how hard is it to get right? One Big Data architect at WalmartLabs blogged that it “has not been easy” to
achieve performance goals, explaining that:
We had to . . . create schema that looked ‘weird’ but turned out to be more performant, and consider various
strategies to ‘delete’ the data. . . . Of all the tunings, strategies, tricks that can be applied to Cassandra to
73 https://youtu.be/tGPqvl3BwzY?t=126
74 https://aero-media.aerospike.com/2018/10/InMobi-Case-Study.pdf
75 https://news.ycombinator.com/item?id=9664203
76 https://www.datastax.com/2019/02/how-to-get-the-most-out-of-apache-cassandra
77 https://youtu.be/GoPS98hU5UQ?t=182
16
gain performance, designing the most optimal schema had the biggest impact in terms of latency, stability,
and availability of the cluster.78
Indeed, how to avoid data modeling pitfalls is a popular topic among Cassandra users and vendors. For
example, one blog79 advises readers to avoid the “three stooges” of Cassandra data modeling (wide
partitions, tombstones, and data skew) while another80 focuses exclusively on tombstones (which “can cause
significant performance problems if not investigated, monitored, and dealt with in a timely manner”). As one
Cassandra consultant put it:
An incorrect data model can turn a single query into hundreds of queries, resulting in increased latency,
decreased throughput, and missed SLAs.81
Sign #5: You’re struggling to acquire and retain Cassandra expertise
Technical resources are hard to acquire and retain, and finding staff with deep Cassandra skills -- or
developing such skills in-house -- is no exception. Yet, you often need to do just that to effectively operate
Cassandra clusters. Indeed, a recent DataStax blog on “How to Get the Most out of Apache Cassandra” cites
the need to “thoroughly” train your staff as its top recommendation:
Getting the most out of Cassandra starts with making sure your team knows the ins and outs of the
technology and is comfortable using it.82
The blog recommends that training begin “a few weeks or even months” prior to a rollout, and further
suggests directing IT staff to additional resources beyond regular training exercises to help them get up to
speed.
Perhaps you don’t have the time or budget for that. Unfortunately, hiring skilled Cassandra technical staff can
prove challenging. A 2019 CIO survey by Harvey Nash/KPMG83 reported Big Data skills to be the scarcest.
Of course, Big Data is Cassandra’s target market.
But let’s say you’re one of the lucky organizations that manages to attract the interest of technical
professionals with Cassandra skills. Expect to pay a premium to get them to work for you. Recent salary
surveys84 85 indicate that senior software engineers and DBAs with Cassandra skills often earn 15%-20%
more than their peers skilled in other NoSQL or SQL DBMSs (e.g., MongoDB or Oracle). One 2017 analysis
from a Cassandra-as-a-service provider estimated that the burdened expense for recruiting and employing a
Cassandra expert would exceed $275,000 for the first year; additional expenses for support, equipment, and
other needs would drive initial total costs to $400,000 - $450,000 or more.86 And then there’s the issue of
staff retention:
78 https://medium.com/walmartlabs/avoid-pitfalls-in-scaling-your-cassandra-cluster-lessons-and-remedies-
a71ca01f8c04
79 https://blog.anant.us/common-problems-cassandra-data-models/
80 https://opencredo.com/blogs/cassandra-tombstones-common-issues/
81 https://thelastpickle.com/blog/2017/03/16/compaction-nuance.html
82 https://www.datastax.com/2019/02/how-to-get-the-most-out-of-apache-cassandra
83 https://advisory.kpmg.us/articles/2019/cio-survey-harvey-nash-2019.html
84 https://www.payscale.com/research/US/Job=Senior_Software_Engineer/Salary/cd797527/Apache-
Cassandra
85 https://neuvoo.com/salary/?job=Cassandra+DBA
86 https://www.instaclustr.com/wp-content/uploads/2017/12/Avoiding-the-challenges-and-pitfalls-of-
Cassandra-implementation_White-Paper.pdf
17
At present, the demand for Cassandra expertise far outweighs the supply. . . . Each year, you can expect the
salaries for your core Cassandra team to grow by 10% or more just to keep them in your company as
recruiters call on them with higher and higher financial rewards if they leave your company.87
All too often, organizations using Cassandra learn how tough it can be to staff their projects. As a data
architect at Wayfair explained,
It’s very difficult to find people skilled in Cassandra who know all the ins and outs.88
Adding to your staffing challenges is the fact that your operations team must understand the differences
between the multiple versions of Cassandra and be able to tune effectively for them. Will your staff choose the
generic Apache Cassandra distribution or a vendored version created by your Cassandra distributor?
Differences between the open source and vendored versions aren’t simply a matter of a few features. For
example, one vendor’s Cassandra implementation (DataStax Enterprise) employs a thread-per-core (TPC)
architecture,89 while Apache Cassandra uses a staged event-driven architecture (SEDA).90 Performance
characteristics differ between the two, which can impact tuning requirements.
Yet there’s more for your staff to consider beyond tuning. If your team selects a Cassandra distributor, will the
distributor provide current releases? Will the distributor keep up with the community for features and bug fixes
or will your operations staff need to patch a vendor distribution with fixes from the open source version?
Historically, even core committers like DataStax have deferred migrating to a new major release from Apache
Cassandra by 8 months or more. You’ll have to decide if you want to wait until a patch is vendored in, move
to a newer Apache release yourself (and abandon your support subscription), or fix the code base you have.
Committing changes between distributions is just one more complex task for your operations staff to
undertake. As noted earlier, scaling Cassandra to support your growing data volumes can force your staff to
manage increasingly large clusters, run multiple instances per node, and become proficient in additional
software disciplines, such as JVM tuning, containers, virtual machines, etc. Expect to allocate more time for
your staff to become proficient in such skills, or expect to pay more to hire expert consultants to do the work
for you.
As mentioned earlier, a less-than-ideal data model can lead to performance and other problems in your
production environment. This has significant implications for your operations and development staff. Consider
the implications of using distributed counters in your application. To many developers, distributed counters
might seem like a reasonable feature for a distributed database, especially when performing real-time
analytics. Perhaps your application team chose them. That could prove troublesome. As Andrew Montalenti,
CTO of Parse.ly, noted:
When I’m in a good mood, I sometimes ask questions about Counters in the Cassandra IRC channel, and if
I’m lucky, long-time developers don’t laugh me out of the room. Sometimes, they just call me a “brave soul”...
All of this is to say: Cassandra Counters — it’s a trap! Run! 91
Again, this isn’t an isolated case. Indeed, one database architect and consultant with 8 years of Cassandra
experience explored a number of situations in which users “go wrong”:
87 https://www.instaclustr.com/wp-content/uploads/2017/12/Avoiding-the-challenges-and-pitfalls-of-
Cassandra-implementation_White-Paper.pdf
88 https://youtu.be/IjWnG8dWTms?t=226
89 https://docs.datastax.com/en/dse/6.0/dse-admin/datastax_enterprise/newFeatures.html
90 https://issues.apache.org/jira/browse/CASSANDRA-10989
91 https://blog.parse.ly/post/1928/cass/
18
Cassandra has a bunch of features that probably shouldn’t be there. . . .
● Counters: They work most of the time, but they are very expensive and should not be used very
often.
● Light weight transactions: They are not transactions nor are they light weight.
● Batches: Sending a bunch of operations to the server at one time is usually good, saves network
time, right? Well in the case of Cassandra not so much.
● Materialized views: I got taken in on this one. It looked like it made so much sense. Because of
course it does. But then you look at how it has to work, and you go…Oh no!
. . . 92
Your operations staff can’t rest by just creating a checklist of features used; they must know how the data is
used and what its life cycle is. Consider something simple like a queue, where you need to maintain order and
also expunge data as soon as it’s processed. That simple data model design leads to operational problems
down the road, as Cassandra then needs to manage large numbers of tombstones (deleted records). As
Ryan Svihla, a Solution Architect at DataStax, remarked:
You realize that based on your queue workflow instead of 5 records you’ll end up with millions and millions per
day for your short lived queue, your query times end up missing SLA and you realize this won’t work for your
tiny cluster. 93
Challenges that can arise from tombstones aren’t limited to queues (a use case often acknowledged as one
of several Cassandra “anti-patterns”). One author commented on his experience with lists:
Too many tombstones and cassandra goes kaput. Well, more accurately, if you scan too many tombstones in
a query (100k by default) in later cassandra versions, it will cause your query to fail. This is a safe guard to
prevent against OOM and poor performance. 94
The use of TTL (Time-To-Live) as a mechanism to remove data after a specified time period is another
potential trap. As one Cassandra user observed:
... [TTL auto-expire] was able to effect a denial of service for all logins through creating a large amount of
garbage [tombstone] records. Once the records for these failed logins had expired, all queries to this table
started timing out.95
After exploring several examples involving Cassandra “TTL intricacies,” another user concluded,
In Cassandra, TTL should be used with caution otherwise it can lead to erratic behavior. 96
Furthermore, your operations staff must absorb the consequences of your development staff picking the
wrong primary or partition key. A poor choice inevitably ends up with hotspot nodes, which can cause
memory, CPU, and/or I/O pressure. This leads to the kind of cascading failures described earlier.
Unfortunately, as one Cassandra consultant noted, it’s not always easy to develop the right data model:
Evaluating a seemingly valid and straightforward data model for potential pitfalls requires a level of expertise
in Cassandra internals, the storage format in particular. . . . Performance and smooth operations of a
92 https://blog.pythian.com/cassandra-use-cases/
93 https://medium.com/@foundev/domain-modeling-around-deletes-1cc9b6da0d24#.goi7cxibs
94 https://jsravn.com/2015/05/13/cassandra-tombstones-collections/#lists
95 https://www.tildedave.com/2014/03/01/application-failure-scenarios-with-cassandra.html
96 https://medium.com/@27.rahul.k/cassandra-ttl-intricacies-and-usage-by-examples-d54248f2853c
19
Cassandra cluster depend in large part on the quality of the data model and how well it suits the application.
There are many factors that shape the suitable data model and its design involves many complex decisions
and tradeoffs. 97
Moreover, at least one seemingly promising feature -- materialized views -- turned out to be more trouble than
it’s worth, according to many users.
When materialized views (MVs) were added to Cassandra 3 everyone, including me, was excited. It was one
of the most interesting features added in a long time and the possibility of avoiding manual denormalization
was very exciting. . . . Since the release of the feature, materialized views have retroactively been marked as
experimental, and we don’t recommend them for normal use. Their use makes it very difficult (or impossible)
to repair and bootstrap new nodes into the cluster. 98
And that’s not all. Another user lamented about synchronization issues99 while a third described functional
restrictions and performance issues:
Most importantly the serious restrictions on the possible primary keys of the Materialized Views limit their
usefulness a great deal. In addition any Views will have to have a well-chosen partition key and extra
consideration needs to be given to unexpected tombstone generation in the Materialized Views.
And, there is a definite performance hit compared to simple writes. 100
To be sure, the need for Cassandra expertise101 hasn’t gone unnoticed, prompting vendors to offer training
(sometimes free, sometimes fee-based), consulting services (fee-based), and other manners of support. And,
of course, there’s a lot of documentation, blogs, presentations, and videos online to help you build up in-
house expertise. But do you have the luxury of time -- and the budget -- for that? Increasingly, many firms that
use Cassandra are finding that they don’t. Aerospike offers an attractive alternative.
Aerospike: The Enterprise-Grade NoSQL Database
As noted earlier, Aerospike was designed and built from the ground up to take advantage of modern
computing architectures -- a distinction many customers have some to appreciate. As a vice president at
Nielsen noted,
One thing I love about Aerospike is that it’s very capable of taking advantage of any hardware you throw at it.
This might be your network devices, this might be your SSDs or your NVMe storage, it might be your RAM.102
Aerospike’s intellectual property and technology stack has enabled its customers to obtain and enjoy an
unprecedented low TCO; ultra-fast, predictable performance at scale; unparalleled uptime and reliability; a
self-managing environment; and low operational overhead. Let's explore how Aerospike has achieved this.
Reduced TCO Aerospike is written in C by a development team with deep expertise in networking, storage and databases.
By removing the layers of file system, block, and page caches, and instead building a proprietary log-
97 https://opencredo.com/blogs/cassandra-data-modelling-patterns/
98 https://thelastpickle.com/blog/2019/01/30/new-cluster-recommendations.html
99 https://techblog.fexcofts.com/2018/05/08/cassandra-materialized-views-ready-for-production/
100 https://opencredo.com/blogs/everything-need-know-cassandra-materialized-views/
101 https://adtmag.com/articles/2016/03/15/cassandra-nosql-skills.aspx
102 https://www.aerospike.com/resources/videos/nielsen-hurdling-operational-issues/
20
structured file system designed for the way flash devices work, Aerospike can deliver unprecedented
resource utilization. The bottom line is that Aerospike clusters have far fewer nodes than the equivalent
Cassandra cluster. Indeed, as we mentioned earlier, The Trade Desk103 and Signal104 chose Aerospike over
Cassandra in part because of its resource efficiency and smaller footprint. In doing so, these firms cut
hardware costs and simplified operations. And they’re not alone. For example, AdForm decreased its server
footprint from 32 nodes with Cassandra to 3 with Aerospike and achieved a fourfold expansion of data.105
Aerospike enabled AdForm to sustain the identical throughput as with their old Cassandra cluster, but with
lower -- and consistent -- latency and unmatched availability.
Lower cost. Predictable performance. Higher availability. You get all three with Aerospike.
Let’s explore the cost issues more closely. Aerospike recently updated its 2016 comparative benchmark106
with Cassandra, running the standard Yahoo! Cloud Serving Benchmark (YSCB) on Dell servers using
Aerospike Community Edition 4.6.0.4 and Apache Cassandra 3.11.4. (See Appendix A for full configuration
and benchmark details of Aerospike’s most recent test.) The workload involved a balanced mix of read and
write operations on 2.65 TB of unique user data. Both DBMSs were configured with a replication factor of 2,
so the total user data was 5.3 TB.
103 https://www.aerospike.com/resources/videos/thetradedesk-integrated-hyper-scale-hot-and-cold-store/
104 https://www.aerospike.com/customers/adform-divorces-cassandra-scales-4x-with-2x-reduced-servers/
105 https://www.aerospike.com/customers/adform-divorces-cassandra-scales-4x-with-2x-reduced-servers/
106 https://www.aerospike.com/blog/comparing-nosql-databases-aerospike-vs-cassandra/
21
Aerospike’s throughput and latency advantages were striking. Aerospike demonstrated 15.4x better
throughput, 14.8x better 99th percentile read latency and 12.8x better 99th percentile update latency than
Apache Cassandra. Even at the 95th percentile, Aerospike’s read/write latencies were markedly better than
Cassandra’s. See Table 1.
Read Results Update Results
Throughput
Transactions
(per second)
95th
Percentile
Latency
(µsecs)
99th
Percentile
Latency
(µsecs)
Throughput
Transactions
(per second)
95th
Percentile
Latency
(µsecs)
99th
Percentile
Latency
(µsecs)
Apache Cassandra
3.11.4
23,999 8,487 17,187 23,999 8,319 20,109
Aerospike CE
4.6.0.4
369,998 744 1,163 369,998 1,108 1,574
Aerospike
advantage
15.4x
better
11.4x
better
14.8x
better
15.4x
better
7.51x
better
12.8x
better
Table 1: YCSB comparative results summary
Although we didn’t focus on the insert or loading process, it’s worth noting that we found it difficult to populate
Cassandra with the 2 billion objects required for the tests. When we ran YCSB clients without throttling, we
achieved an insert rate of 83K operations per second (OPS) but found that Cassandra dropped data. That’s
right. Subsequent reads showed that data we thought we had inserted into Cassandra was missing. The data
was loaded with 10 different instances of YCSB inserting different key ranges. We tried populating data at
rate of 60K OPS, but data was still being dropped. Finally, we dropped the insert rate to 40K OPS. After
nearly 14 hours, the load completed successfully -- all data was present. Unfortunately, we found each of the
nodes had over 4000 pending compactions. We hadn’t expected that. We had to wait 5 days for the
compactions to complete before we could proceed with our read/write benchmark tests.
What about Aerospike? By contrast, we successfully populated Aerospike with all 2 billion keys in less than 1
hour. Moreover, Aerospike was ready for the read/write benchmark tests immediately. Reliable and ready to
go.
Let’s consider for a moment how the benchmark results for our read/write workload (shown in Table 1) can
help us calculate TCO. We could take the 15.4x throughput improvement ratio for Aerospike, multiply that by
the server costs, and compare the results. But that would put Cassandra at an unfair disadvantage because
its performance profile will change as servers are added, particularly if multiple instances are run on a single
server. Furthermore, TCO involves more than hardware costs; labor, electricity, cooling, and other costs need
to be considered.
To arrive at a more realistic comparison, we used Aerospike’s benchmark results on a 3-node cluster as the
target SLA and attempted to size Cassandra’s environment to deliver the same throughput and latencies.
(See Appendix A for configuration and sizing details.) Unfortunately, even with different caching assumptions
and hit ratios, Cassandra failed to achieve the desired SLA for 99th percentile reads and writes due to JVM
garbage collection issues. Ultimately, we settled on a 95% page cache hit ratio for Cassandra, which yielded
latencies closest to (yet still higher than) Aerospike’s 95th percentile latencies.
22
If we configured Cassandra with one instance running on each node, we would need 73 servers to come
close to the target SLA achieved by Aerospike on a 3-server cluster. However, many Cassandra users bundle
multiple instances of Cassandra on a single server through virtualization or containers,107 so we used a more
realistic configuration of 3 Cassandra instances per server to lower Cassandra’s cluster size to 25 servers.
(To run multiple instances on a single server, we added 128GiB of memory per instance for each server.) The
differences in initial hardware costs are substantial. As Table 2 illustrates, the Aerospike cluster costs a mere
8% of the Cassandra cluster.
Cassandra
(3 instances/server)
Aerospike
(1 instance/server)
% cost savings with
Aerospike
Quantity Cost ($) Subtotal Quantity Cost ($) Subtotal
Servers (Dell R730xd)
25 10,842 271,050 3 10,842 32,526
Memory (32GB DIMM)
292 671 195,932 4 671 2,684
Storage (Micron 9200)
50 1,053 52,650 6 1,053 6,318
Cassandra Total 519,632 Aerospike Total 41,528 92%
Table 2: Initial hardware cost comparison for target SLA
But that’s only part of the story. TCO is more than hardware costs -- a lot more, particularly for a scaled-up
production environment.. Table 3 details hardware, power, cooling and support costs over a three-year
period for a sample production scenario. Using the capacity and performance data from the benchmark, we
increased the data size from 2 billion keys of unique data to 20 billion keys of unique data for Year 1, which
translates to 18 TB of unique data and 36 TB of replicated data. Further, we modeled a high-growth business
scenario in which the number of unique keys grows to 30 billion in Year 2, forcing the cluster size to grow by
50%. Year 3 models further growth, with the number of unique keys reaching 39.9 billion, driving the cluster
size to grow by 33% over the previous year. Once again, the results are striking. Aerospike reduced costs by
90%, delivering nearly $9.27 million in savings over the Apache Cassandra cluster for the three-year period.
107 https://robin.io/blog/lessons-learned-cassandra/
23
Cassandra Total Aerospike Total
Year 1 Year 2 Year 3 Year 1 Year 2 Year 3
# of servers 244 365 486 30 45 60
Single server cost
over 3 years ($)
7,000 5,211
Server cost ($) 1,708,000 2,555,000 3,402,000 7,665,000 156,330 234,495 312,660 703,485
Power cost ($) 170,567 255,152 339,737 765,456 20,971 31,457 41,942 94,370
Cooling cost ($) 236,899 354,378 471,856 1,063,133 29,127 43,690 58,252 131,069
Space cost ($) 6,000 6,000 6,000 18,000 770 770 770 2,310
Support cost ($) 174,268 260,689 347,109 782,066 21,426 32,139 42,853 96,418
Total costs ($) 2,295,734 3,431,219 4,566,702 10,293,655 228,624 342,551 456,477 1,027,652
Table 3: Comparative 3-year TCO scenario
How does Aerospike achieve so much on such a comparatively small footprint? As Aerospike is a native C
implementation, there are no inefficiencies from a Java runtime. The primary key index is stored as a
parentless red-black tree108, enabling ultra-fast key lookups in DRAM. The data is then retrieved from the
proprietary log-structured file system that supports parallelization across all the devices on the chassis. A
typical Aerospike node will have 8-12 SSDs and sometimes as many as 16. By reading and writing in parallel
to all devices and removing the block and page caches, Aerospike can fully utilize all the IOPs and disk slots
available before running out of CPU.
Aerospike’s implementation is not faster simply because it uses C. It uses performance-tuned libraries, such
as re-implementations of msgpack, and specific tested versions of JEMalloc. The code is a highly
optimized, reference-counted multithreaded implementation, requiring certain developer skills in which
Aerospike specializes.
Perhaps you’re still unconvinced about Aerospike’s cost advantages even after reviewing our comparative
scenario. Aerospike has converted many Cassandra clusters to Aerospike, delivering tangible savings to
users as a result. Read Debunking the Free Open Source Myth to learn how one Cassandra customer
reduced its costs, server count, and latencies with Aerospike.
108 https://en.wikipedia.org/wiki/Red–black_tree
24
Ultra Fast, Predictable Performance at Scale Firms unfamiliar with Aerospike frequently ask how it can achieve ultra-fast and predictable performance at
scale. Quite simply, it was designed with that objective in mind. For example, Aerospike uses master-based
replication so operations are forwarded to the primary node for the record. This is equivalent to having a
specific coordinator node for each portion of the data. The primary key is generated by a cryptographic
algorithm, RIPEMD-160, used by the Bitcoin algorithm, which has had zero detected hash collisions in any
use. The first twelve bits of this generated hash identify the partition where the record resides.
Aerospike supports all the popular languages (C/C++, C#, Java, Python, Go, Node.js, PHP, Ruby, Perl, and
Erlang) with a vendor-supported language client. These high-performance native clients provide translation of
native data types to and from Aerospike, as well as interoperability between languages, greatly improving
developer productivity.
Unlike the Cassandra’s proxy model in which the coordinator node reroutes client requests, Aerospike’s
smart client layer maintains a dynamic partition map that identifies the master node for each partition. This
enables the client layer to route the read or write request directly to the correct node without any additional
network hops. Furthermore, Aerospike writes synchronously to all copies of the data so there is no need to do
any form of quorum read across the cluster to get a consistent version of the data. As AdForm’s CTO noted:
Even more important is the super fast key-value store and extraordinary predictability we get with Aerospike,
providing the responsiveness our clients require to compete in the crowded Internet and mobile markets.109
Yet Aerospike is about more than maintaining predictable performance for a short burst of time or shining in
5-minute benchmarks. Aerospike enables predictable performance over days and months -- the type of
performance that allows your business and applications to grow seamlessly as you broaden your presence in
local and international markets. Kayak’s SVP of Technology highlighted the importance of this:
As we continue to rapidly expand into international markets, we needed a solution that was reliable and could
scale to serve offers across our network. Aerospike enabled us to achieve multi-key gets in less than 3
milliseconds, deploy with ease and scale with very low jitter. 110
Finally, consider the conclusions presented by the chief architect of Hewlett Packard Enterprise’s AI-Driven
Big Data Solutions team, Theresa Melvin, in 2019 when she discussed “AI at Hyperscale - How to Go Faster
with a Smaller Footprint.”111 Faced with the need to plan for emerging “exascale” (extreme scale)
requirements -- e.g., ingesting more than 85PB (petabytes) of data per day, storing 2.6EB (exabytes) of raw
data monthly for 5 - 10 years -- Melvin evaluated various NoSQL databases. Her team ultimately
benchmarked several “pmem-aware” (persistent memory aware) offerings: Aerospike, Redis, RocksDB, and
Cassandra. Aerospike successfully completed the load test. Cassandra didn’t. Melvin observed that after the
4-hour mark was reached, the job was automatically killed with only about half the target records loaded.
After evaluating the full suite of test results, she offered this “take away”:
Aerospike’s PMEM-aware performance for a single node
▪ 300% increase over RockDB
▪ Between 270% and 1667% over Cassandra’s
▪ 270% if Cassandra was able to sustain its 92K ops/sec
▪ 1667% per Cassandra’s current 15K sustained r/u performance
▪ Aerospike’s Cluster Footprint
▪ 2-nodes required for Strong Consistency
109 https://www.aerospike.com/customers/adform-divorces-cassandra-scales-4x-with-2x-reduced-servers/
110 https://www.aerospike.com/blog/how-to-travel-smart-replacing-cache/
111 https://www.aerospike.com/resources/videos/summit19/ty-hpe/
25
▪ 33% footprint reduction over nearly any other K/V store 112
Predictability is another critical aspect of Aerospike’s performance. The comparative benchmark results that
we discussed earlier yielded compelling information about the latency variances for Aerospike vs. Cassandra.
Comparing the two systems based on their ranges of response times and throughput -- rather than a simple
average, as we did earlier -- reveals how Aerospike offers far greater performance predictability than
Cassandra.
Let’s start with the variances for reads. Figs. 1-2 depict the latency variances and the throughput variances,
respectively. Note how few spikes -- variances -- in read latencies were observed with Aerospike (shown in
red) compared to Cassandra (shown in blue). Not only did Aerospike service reads far more quickly, it did so
with far greater predictability, as Fig. 1 shows. The same situation held true for throughput. As Fig. 2 shows,
Aerospike not only serviced a much higher volume of read requests during the same time period as
Cassandra, it did so with more consistent performance.
Fig. 1: Read latency variances observed for Cassandra and Aerospike
112 https://www.aerospike.com/resources/videos/summit19/company/hpe/
26
Fig. 2: Read throughput variances observed for Cassandra and Aerospike
What about writes? The situation is quite similar. Fig. 3 illustrates how Aerospike processed writes far more
quickly and with far more consistent performance than Cassandra. Furthermore, Aerospike’s throughput was
much more predictable -- and much higher -- than Cassandra’s, as shown in Fig. 4.
Fig. 3: Write latency variances observed for Cassandra and Aerospike
27
Fig. 4: Write throughput variances observed for Cassandra and Aerospike
Simply put, Aerospike delivers highly consistent response times and throughput to its users. It’s very
predictable, which makes it easy for firms to meet their application SLAs even as their business demands
grow.
Proven Durability and Availability Aerospike uses a shared-nothing architecture. All nodes serve as peers; none has a specific role. Aerospike
is built with a unique master-based cluster algorithm. A single node will be the owner or the primary node for
that partition. If a replication factor is defined, then there will be N number of nodes that have a copy of that
record for reliability. If you lose a node, then you have another copy. And unlike other systems, Aerospike
writes synchronously across all copies of the data.
In Aerospike’s default configuration, when a node fails or is removed from the cluster, any node that has a
secondary copy of the partition can be instantly promoted to be the master of that partition without the typical
delays imposed by a consensus algorithm. Aerospike uses the Paxos algorithm to ensure consensus across
the cluster, but since each node is equal in its role and equal with the current state of the data, any of the
secondary nodes can be promoted. This has allowed an organization like AppNexus to minimize downtime,
as expressed by its CTO, Geir Magnusson:
2.5 million impressions a second at peak, although we can go much higher, and we see north of 90 Billion
impressions per day and this is a 24×7 business with 100% uptime with Aerospike.113
Beyond “fair-weather” performance and availability, you also need a system that can deal with human errors
(e.g., “phat fingering”) and natural disasters (e.g., the weather). Aerospike has the ability to ensure availability
across regions using Cross Datacenter Replication (XDR), an essential feature for many global firms. As
Henry Snow, VP of infrastructure for the Nielsen Marketing Cloud, explained:
113 https://www.aerospike.com/customers/why-appnexus-uses-aerospike/
28
Cross Datacenter Replication is vital to our organization. We need to have redundant data centers. We need
our user objects to be available in multiple facilities. . . . The ability to replicate data across regions is
something that Aerospike provides that very (few) other NoSQL databases do with ease.114
It’s also important to note that Aerospike supports strong, immediate data consistency. This support
precludes data loss. In particular, Aerospike guarantees that all writes to a single record will be applied in a
specific sequential order and writes will not be re-ordered or skipped. Furthermore, Aerospike ensures that
writes acknowledged as having been committed have been applied and exist in the transaction timeline even
if a network or disk failure occurs.
For reads, Aerospike supports both full linearizability, which provides a single linear view among all clients that
can observe data, as well as a more practical session consistency, which guarantees an individual process
sees the sequential set of updates. These policies can be chosen on a read-by-read basis, thus allowing the
few transactions that require a higher guarantee pay the extra synchronization price.
Indeed, Aerospike’s strong consistency, 24x7 availability, and ultra-fast performance were among the
reasons why a major Europoean bank adopted it for managing instant payments. The firm needed a system of
record capable of processing 43 million transactions per day with very low latencies. Of course, preserving
data consistency was critical, even as the application scaled to support up to 1000 European banks. The
ability to replicate data across data centers was also important to the firm’s high availability objects.
Aerospike proved to be the solution that satisfied all their aggressive requirements.
For more information about Aerospike’s strong consistency support, see the white paper on Exploring data
consistency in Aerospike Enterprise Edition115 or read about the Jepsen report.116
Self-Managing Features The true value of complex technology is its simplicity of usage, especially in challenging production
environments. You don’t want to spend time and energy when the cluster expands, when you refresh the
hardware or when you migrate to a new Data Center. You want to use your database infrastructure like a
utility, adding capacity as and when you need to without extensive planning for maintenance windows. That’s
one of the reasons many firms prefer to use Aerospike. Indeed, at a 2019 conference IronSource recently
characterized its daily maintenance with Aerospike as “minimal to none.” As one IronSource staff member put
it,
I barely touch the cluster; it simply works. 117
Aerospike achieves this operational simplicity with self-forming and self-healing clusters that are both rack-
aware and data center-aware. No downtime is required to add or remove nodes from a cluster; Aerospike
automatically re-distributes partitions of data to ensure the new compute resources are efficiently used.
Contrast that with this advice from a Cassandra expert to fellow Cassandra users:
When making major changes to the cluster (expanding, migrating, decommissioning), GO SLOW. 118
Furthermore, Aerospike’s rack awareness also ensures that data is correctly separated to avoid
compounding failures. Using a proprietary algorithm, nodes can be restarted -- for example, to perform a
114 https://www.aerospike.com/resources/videos/nielsen-hurdling-operational-issues/
115 https://www.aerospike.com/lp/exploring-data-consistency-aerospike-enterprise-edition/
116 https://jepsen.io/analyses/aerospike-3-99-0-3
117 https://www.aerospike.com/resources/videos/summit19/company/ironsource/
118 https://www.slideshare.net/DataStax/lessons-learned-from-running-1800-clusters
29
software upgrade -- but memory contents are preserved, allowing the process to restart without the need to
warm the memory buffers and caches in just seconds, as the data was never evicted from DRAM.
As one technology leader at Cybereason said,
So what surprises us about Aerospike is how easy it is to work with and how easy it is to operate. Using
Aerospike we are able to create a big data solution, persistent and highly available. . . . Aerospike is so easy
to work with. Aerospike does everything automatically.119
Complex technology does not have to be complex to use and operate. Period.
Staffing Hiring and retaining staff is always a critical task in any organization. You need to ensure that the services and
applications you build and deploy meet the time-to-market needs of your organization. Yet people costs
should never be excluded from this calculation. The long-term care and maintenance of your infrastructure is
a real cost that you need to drive down if only to spend your company's energy on unique business functions
rather than database maintenance. From an operational perspective, Aerospike radically simplifies running
and maintaining a distributed database. Ken Bakunas, a data architect at Wayfair, put it simply:
Other groups [in our firm] hate us since nothing goes wrong, no downtime. 120
Indeed, Aerospike’s relatively small server footprint, combined with its self-managing and self-healing
features, means that your staffing needs are likely to be much lower -- and much easier to fulfill -- than you
budgeted for with Cassandra.
Contrast this with the typical advice from the Cassandra community, such as this blog from Threat Stack:
Don’t under staff Cassandra. This is hard as a start up, but recognize going in that it could require 1 to 2 FTEs
as you ramp up, maybe more depending on how quickly you scale up. 121
Summary Although firms often view Cassandra as the obvious choice for their initial NoSQL projects, many find
themselves facing high operational costs and SLA violations as their data volumes and workloads grow.
Unpredictable performance, sprawling server footprints, operational and tuning challenges, and IT budget
overruns are common trouble spots. Indeed, here are five signs that your firm may have outgrown
Cassandra:
1. You’re struggling with ever-larger server footprints and worried about TCO.
2. You’re finding it harder and harder to meet SLAs.
3. You’re facing more and more cascading failures.
4. Your operations team keeps growing — and so does costs.
5. You’re having trouble acquiring and retaining Cassandra expertise.
It doesn’t have to be that way. Aerospike offers a compelling alternative. As many former Cassandra users
have already discovered, Aerospike provides ultra-fast, predictable performance for mixed operational
workloads at scale. And it does so at a fraction of the operational cost of Cassandra and other alternatives.
119 https://www.aerospike.com/resources/videos/cybereason-implementing-high-performance-scalable-
petabyte-cyber-solution/
120 https://www.aerospike.com/resources/videos/summit19/company/wayfair/
121 https://www.threatstack.com/blog/scaling-cassandra-lessons-learned
30
Indeed, firms that have migrated from Cassandra to Aerospike routinely enjoy substantial TCO savings thanks
to Aerospike’s highly efficient distributed database technology and its patented hybrid memory architecture.
For example, one company that migrated from Cassandra to Aerospike shrunk its server from 450 to 60
nodes and is projecting a $6.9 million in TCO savings over three years. And they’re not alone.
So why wait? If you’re struggling to meet your business needs with Cassandra, why not explore what
Aerospike can do for you? Firms in financial services, telecommunications, technology, retail, and other
industries have already realized concrete business benefits by deploying Aerospike for their mission-critical
applications. Contact Aerospike to arrange a technical briefing or discuss potential pilot projects. You might
be surprised just how much Aerospike can do for you.
31
Appendix A: Comparative benchmark details
In case you’re curious about the YCSB results and TCO described earlier, this details the configuration and
assumptions used for these efforts.
Configuration To update Aerospike’s 2016 comparative benchmark study122, we used the following configuration:
DBMS editions: Aerospike Community Edition 4.6.0.4, Apache Cassandra 3.11.4
Operating system: Centos 7.6 - Linux kernel version - 3.10-693
YCSB: Version 0.12. We briefly tested 0.14 but observed known problems with update operations and the
max_executiontime. Therefore, we chose the more stable version 0.12.f
Java: Version 1.8.0-201
Server Hardware:
3 x Dell R730xd Rack mounted server.
Numa Nodes 2
Model name: Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz
RAM 4 x 32GB DIMM DDR4 Synchronous Registered (Buffered) 2133 MHz
OS Drive Dell 465G HDD
Data Storage 2 x 1.6 TB Micron 9200 Max
Network Intel Ethernet Controller X710 for 1GbE SFP+
Client Hardware:
1 x Dell R730xd Rack mounted server.
Numa Nodes 2
Model name: Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz
RAM 4 x 32GB DIMM DDR4 Synchronous Registered (Buffered) 2133 MHz
OS Drive Dell 465G HDD
Network Intel Ethernet Controller X710 for 1GbE SFP+
As of this writing, the cost per server without memory is $10,842. The cost per 32 GiB DIMM is $671. And the
cost for a 1.6 TB 9200 Max SSD is $1053. These figures are reflected in Table 2 presented earlier in the
paper.
Cassandra TCO and sizing assumptions As described earlier, we needed to scale up Cassandra’s environment to meet the throughput and latencies
of Aerospike that served as the SLA for our comparative TCO scenario. Consequently, we made the following
assumptions for Cassandra: 1. Cassandra scales linearly.
2. Caching drives performance for Cassandra. (We sized for efficient page caching to maximize mixed
read/write performance and reduce the impact of compactions.)
3. The page cache hit ratio is defined by the available memory (64GiB for page cache).
4. The number of unique keys is 2 billion (2 x 109).
5. With a replication factor of 2, the data set contains a total of 4 billion keys.
122 https://www.aerospike.com/blog/comparing-nosql-databases-aerospike-vs-cassandra/
32
6. The total data size from the key set is 4 billion x 1331 = 5.324 terabytes.
7. Three nodes are used to create a scaled version of the cluster.
8. The server profile is fixed.
To calculate the TCO, we based the infrastructure-related costs on industry sources detailing best practices
as they existed at the time of our study (4Q 2019). Here are the additional assumptions that factored into our
TCO calculations:
1. The server cost is based on the benchmark results and sizing. The Cassandra cluster was sized with
3 nodes per server. The full depreciation for servers acquired in year 2 and year 3 occurs in year 4
and 5 which are not shown in the table above.
2. Power cost is based on 15.2¢ per kWh.
3. Space cost is based on $200 per sqft of construction cost depreciated over 39 years. The estimate
for necessary square footage is based on 30 sq ft per rack. Each rack contains 13 servers.
4. Support cost is based on an average salary of a system administrator of $85,706. Each administrator
maintains 120 servers.
5. Cooling power is calculated from 50% of total power of a data center. IT power is runs 36% of power
used in a data center. Using the server power as IT power, an estimate of the total power cost is
estimated. From the total power cost estimate the cooling cost is calculated. These estimates are in
line with other cooling cost models.
33
About Aerospike
Aerospike is the global leader in next-generation, real-time NoSQL data solutions for any scale. Aerospike
enterprises overcome seemingly impossible data bottlenecks to compete and win with a fraction of the
infrastructure complexity and cost of legacy NoSQL databases. Aerospike’s patented Hybrid Memory
Architecture™ delivers an unbreakable competitive advantage by unlocking the full potential of modern
hardware, delivering previously unimaginable value from vast amounts of data at the edge, to the core and in
the cloud. Aerospike empowers customers to instantly fight fraud; dramatically increase shopping cart size;
deploy global digital payment networks; and deliver instant, one-to-one personalization for millions of
customers. Aerospike customers include Airtel, Baidu, Banca d’Italia, Nielsen, PayPal, Snap, Verizon Media
and Wayfair. The company is headquartered in Mountain View, Calif., with additional locations in London;
Bengaluru, India; and Or Yehuda, Israel.
© 2019 Aerospike, Inc. All rights reserved. Aerospike and the Aerospike logo are trademarks or registered
trademarks of Aerospike. All other names and trademarks are for identification purposes and are the property
of their respective owners.
2525 E. Charleston Road, Suite 201
Mountain View, CA 94043
Tel: 408 462 2376
www.aerospike.com