35
Novel Multi-region Clusters Cassandra Deployments Split Between Heterogeneous Data Centres with NAT & DNS-SD #CassandraSummit

Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

Novel Multi-region ClustersCassandra Deployments Split Between Heterogeneous Data Centres

with NAT & DNS-SD

#CassandraSummit

Page 2: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

Adam ZegelinCo-founder & VP of Engineering

www.instaclustr.com

[email protected]@adamzegelin

Page 3: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

Instaclustr

• Instaclustr provides Cassandra-as-a-service in the cloud (Currently only on AWS — Google Cloud in private beta)

• We currently manage 50+ Cassandra nodes for various customers

• We often get requests to do cool things — and try and make it happen!

Page 4: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

Multi-DC @ Instaclustr• Cloud ⇄ cloud, “classic” internet-facing data centre ⇄ cloud

• Works out-of-the-box today.

• Requires per-node public IP

• Private network clusters ⇄ Cloud clusters

• Easy if your private network allocates per-node public IP addresses

• VPNs

• Something else?

Page 5: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

• Overview of multi- region/data centre clusters

• What is supported out-of-the-box

• Alternative solutions

• Supporting technology overview (NAT/PAT and DNS-SD)

• Implementation

Page 6: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

Single Node

• What you get from running apt-get install cassandra and /usr/bin/cassandra

• Fragile (no redundancy)

• Dev/test/sandbox only

C*

Page 7: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

Multi-node, Single Data Centre• Two or more servers running

Cassandra within one DC

• Replication of data (redundancy)

• Increased capacity (storage + throughput)

• Baseline for production clusters

C* C*

C*

Page 8: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

Multi-node, Multi-DC

• Cassandra running in two or more data centres

• Global deployments

• Data near your customers (reduced latency)

• Supported out-of-the-box

C* C*

C*

C* C*

C*

C* C*

C*

Page 9: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

Snitches• Understands data centres and racks

• Implementation may automatically determine node DC and rack (EC2MultiRegionSnitch uses AWS internal metadata service, GossipingPropertiesFileSnitch loads a .properties file)

• Node DC and rack is advertised via Gossip

• Determine node proximity (estimated link latency)

• Cluster may use a combination of Snitch implementations

Page 10: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

Data Centres

• Collection of Racks

• Complete replications

• Geographically separate

• Possibly high-latency interconnects (e.g. East Coast US → Sydney, ~300ms round-trip)

Page 11: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

Racks

• Collection of nodes

• May fail as a single unit

• Modelled on the traditional DC rack/cage (n-servers running of a UPS)

Page 12: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

☁• Amazon Web Services

(use EC2MultiRegionSnitch)

• Data Centre ≡ AWS Region(e.g. US_East_1, AP_SOUTHEAST_2)

• Rack ≡ Availability Zone(e.g. us-east-1a, ap-southeast-2b)

• Google Cloud Platform(no out-of-the-box auto-configuring snitch — use GossipingPropertiesFileSnitch, or roll your own!)

• Data Centre ≡ GCP Region(e.g. US, Europe)

• Rack ≡ Zone(e.g. us-central1-a, europe-west1-a)

Page 13: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

Data Centre Aware• Cassandra is data centre aware

• Only fetch data from a remote DC if absolutely required (remote data is more “expensive”)

• Clients can be made data centre aware

• If your app knows its DC, client will talk to the closest DC

Page 14: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

Cluster cluster = Cluster.builder() .addContactPoint(…) .withLoadBalancingPolicy(new DCAwareRoundRobinPolicy(“US_EAST_1")) .build();

Page 15: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

Multi DC Support

• Per-node public (internet-facing) IP address

• Optionally, per-node private IP address

• Per-node public address is used for inter-data centre connectivity

• Per node private address is used for intra-data centre connectivity

Page 16: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

Multi DC Support• Cloud ⇄ cloud, traditional ⇄ cloud, traditional ⇄ traditional

• Easy to setup per-node public and private addresses

• Private network clusters ⇄ Cloud clusters

• Private networks: 𝑛 public addresses, shared by 𝑥 private addresses. Not 1 ↔ 1 (where often 𝑥 > 𝑛)

• done via Network Address Translation

Page 17: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

IPv4 Address Space Exhaustion

Source: http://www.potaroo.net/tools/ipv4/

Page 18: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

Multi-DC Support

• IPv4

• Address exhaustion

• Over time, will become more expensive to purchase addresses

• Wasteful(being a good internet citizen)

Page 19: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

Alternatives• IPv6

• Java supports it ∴ Cassandra probably supports it (untested by us)

• Global IPv6 adoption is ~4%(according to Google — google.com/intl/en/ipv6/statistics.html)

• IPv6/IPv4 hybrid(Teredo, 6over4, et. al.)

• AWS EC2 does not support IPv6. End of story. (Elastic Load Balancer does support IPv6)

Page 20: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

Alternatives• VPNs

• tinc, OpenVPN, etc.

• All private address space — no dual addressing

• Requires multiple links — between every DC and per client

• Address space overlaps between multiple VPNs

• Connectivity to multiple clusters an issue (for multi-cluster apps, centralised monitoring, etc)

Page 21: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

Data Centres Links

3 3

5 10

7 21

Page 22: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

Alternatives

• Network Address Translation (NAT)(aka IP Masquerading or Port Address Translation (PAT))

• Deployed on most private networks

• Connectivity between private network clusters ⇄ Cloud clusters

• Supports client connectivity to multiple clusters

Page 23: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

NAT Basics• Re-maps IP address spaces

(e.g. Public 96.31.81.80 ↔ Private 192.168.*.*)

• 𝑛 public addresses, shared by 𝑥 private addresses. Not 1 ↔ 1 (where often n = 1, 𝑥 > 𝑛)

• Port Address Translation

• Private port ↔ Public port

• Outbound connections only without port forwarding or NAT traversal

• Per DC gateway device — performs NAT and port forwarding

Page 24: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

NAT with Inbound Connections

• Static port forwarding(configured on the gateway)

• Automatic port forwarding — UPnP, NAT-PMP/PCP (configured by the application, e.g. Cassandra)

• NAT Traversal — STUN, ICE, etc.

Page 25: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

NAT + C∗

Situation: 𝑛 Cassandra nodes, 1 public address per data centre

• Port forward different public ports for each node

• Advertise assigned ports

• Modify Cassandra and client applications to connect to advertised ports

Page 26: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

Advertising Port Mappings• Extend Cassandra Gossip

• Include port numbers in node address announcements

• Allow seed node addresses to include port numbers

• Allow multiple nodes to have identical public & private addresses(only port numbers differ per DC)

• How to bootstrap? SIP?

• Cassandra must be aware of the allocated ports in order to advertise

• Hard if C* is not directly responsible for the port mapping (e.g. static port forwarding)

• Too many modifications to internals

Page 27: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

Advertising Port Mappings• DNS-SD — dns-sd.org

(aka Bonjour/Zeroconf)

• Reads — works with existing DNS implementations(it’s just a DNS query)

• Even inside restrictive networks, DNS usually works

• Combination of DNS TXT, SRV and PTR records.

• Updates

• via DNS Update & TSIG — supported by bind

• via API — e.g. for AWS Route 53

Page 28: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

Advertising Port Mappings• DNS-SD cont’d.

• SRV records contain hostname and port(i.e., hostname of the NAT gateway and public C* port)

• TXT records contain key=value pairs(useful for additional connection & config details)

• Modify C* connection code to lookup foreign node port from DNS

• Modify client driver connection code to lookup ports from DNS

• Can be queried & updated out-of-band(updated by the NAT device or central management server which knows which ports were mapped)

Page 29: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

Advertised Details• Each cluster is it’s own browse domain

• Each NAT gateway device has an A record in the browse domain

• Each DNS-SD service is named based on the private IP address

• Requires unique private IP addresses across data centres

• SRV port is the C* thrift port

• Additional ports are advertise via TXT

Page 30: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

Configuration• Cassandra is configured to only use private addresses

• On cluster creation

• Establish a new DNS-SD browse domain

• Create A records for each gateway device

• NAT gateway device is notified when a new C* node is started

• Allocates random public ports for C* and configures Port Forwarding

• Updates DNS-SD

• New SRV and TXT record

Page 31: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

$ dns-sd -B _cassandra._tcp 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au.Browsing for _cassandra._tcp

A/R Flags if Domain Service Type Instance NameAdd 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-2-4Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-1-2Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-2-3Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-2-2Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-1-4Add 2 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-1-3

$ dns-sd -L 192-168-1-4 _cassandra._tcp 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au.Lookup 192-168-1-4._cassandra._tcp.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au.

192-168-1-4._cassandra._tcp.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. can be reached at aws-us-east1-gateway.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au.:1236 (interface 0) version=2.0.7 cqlport=1237

$ nslookup aws-us-east1-gateway.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au.

Non-authoritative answer:Name: aws-us-east1-gateway.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.auAddress: 54.209.123.195

Output of dns-sd (Can also use avahi-browse, dig, or any other DNS query tool)

Page 32: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

Java Driver Modifications

• This is usually a no-op (the default is IdentityTranslater)

• Modify translate() to perform a DNS-SD lookup.

• The address parameter is a node private IP address.

• Locate a service with a name = private IP address to determine public IP/port.

public interface AddressTranslater { public InetSocketAddress translate(InetSocketAddress address); }

Page 33: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

Modifying Cassandra

• Responsible for managing Socket connections.

• Modify newSocket() to perform a DNS-SD lookup.

• The endpoint parameter is a node private IP address.

• Locate a service with a name = private IP address to determine public IP/port

public class OutboundTcpConnectionPool{

⋮ public static Socket newSocket(InetAddress endpoint) throws IOException {…} ⋮ }

Page 34: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

C* C*

C*

C* C*

C*

NAT Gateway NAT Gateway

DNS (+ DNS-SD) Server (Route 53, Self-hosted, etc)Client

Application

Page 35: Novel Multi-region Clusters - Instaclustr...Cassandra Multi-Region Clusters CS2014.key Created Date: 20151013042310Z

Thanks! Questions?

[email protected]