Upload
planet-cassandra
View
299
Download
2
Embed Size (px)
DESCRIPTION
Adding a new technology to your development process can be challenging, and the distributed nature of Apache Cassandra can make it daunting. However, recent improvements in drivers, utilities and tooling have simplified the process making it easier than ever before to develop software with Apache Cassandra. In this presentation we will cover essential knowledge for all developers wanting to efficiently create reliable Apache Cassandra based solutions. Topics will include: - Language and Driver selection - Optimizing Driver configuration - Productive Developer environments using ccm, Vagrant and DataStax DevCenter - Creating appropriate test data - Unit testing - Automated integration testing New and existing users will leave this presentation with the necessary knowledge to make their next Apache Cassandra project a success.
Citation preview
CASSANDRA-SF 2014
SUCCESSFUL SOFTWARE DEVELOPMENT WITH
CASSANDRA Nate McCall
@zznate #CassandraSummit
Co-Founder & Sr. Technical Consultant
Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
About The Last Pickle. !
Work with clients to deliver and improve Apache Cassandra based solutions.
!
Based in New Zealand & USA.
OVERVIEW
Overview:
What makes a software development
project successful?
Overview: Successful Software Development
- it ships - maintainable - good test coverage - check out and build
Overview:
Impedance mismatch: distributed systems
development on a laptop.
GETTING STARTED: FOLLOW THE PATH OF LEAST
RESISTANCE
Getting Started: !
JVM-Based if at all Possible.
Getting Started: !
Python Otherwise.
https://github.com/datastax/python-driver
Getting Started: !
C#?
https://github.com/datastax/csharp-driver
Getting Started: !
Ruby?
https://github.com/datastax/ruby-driver
Getting Started: !
ORM? maybe - only if it’s very simple
more later…
http://www.datastax.com/documentation/developer/java-driver/2.1/java-driver/reference/crudOperations.html
DATA MODELING
Data Modeling: !
… a topic unto itself. But quickly:
Data Modeling - Quickly !
• It’s Hard • Do research • #1 performance problem • Tip: don’t “port” your schema
DEVELOPER PRODUCTIVITY
Productivity: !
use CQL
Productivity - Using CQL: !
• tools support • easy tracing (and trace discovery) • documentation*
*Maintained in-tree: https://github.com/apache/cassandra/blob/cassandra-1.2/doc/cql3/CQL.textile https://github.com/apache/cassandra/blob/cassandra-2.0/doc/cql3/CQL.textile https://github.com/apache/cassandra/blob/cassandra-2.1/doc/cql3/CQL.textile
Productivity: !
Use the Java Driver
Productivity - Java Driver :
!
• Reference implementation • Well written, extensive coverage • open source
https://github.com/datastax/java-driver/
Productivity - Java Driver : !
Existing Spring Users: Spring Data Integration
http://projects.spring.io/spring-data-cassandra/
Productivity - Java Driver : !
Guice Users: “GuicyFig:”
Archaius + Guice
https://stash.safehaus.org/projects/GFIG/repos/main/browse
Productivity - Java Driver : !
Configuration is Similar to Other DB Drivers (with caveats**)
http://www.datastax.com/documentation/developer/java-driver/2.1/common/drivers/reference/clusterConfiguration_c.html
Productivity - Java Driver - Configuration: !
Major Difference: it’s a Cluster!
Productivity - Java Driver - Configuration: !
Two groups of configurations !
• policies • connections
Productivity - Java Driver - Configuration: !
Three Policy Types: • load balancing • connection • retry
Productivity - Java Driver - Configuration: !
Connection Options: • protocol* • pooling • socket
*https://github.com/apache/cassandra/blob/cassandra-2.1/doc/native_protocol_v3.spec
Productivity - Java Driver : !
Embrace Asynchronicity (but use RxJava)
https://github.com/ReactiveX/RxJava
Productivity - Java Driver : !
A note about User Defined Types (UTDs)
Productivity - Java Driver - Using UDTs: !
Wait. - serialized as blobs !!?! - new version already being discussed* - will be a painful migration path
* https://issues.apache.org/jira/browse/CASSANDRA-7423
Productivity: !
Tools: DataStax DevCenter
http://www.datastax.com/what-we-offer/products-services/devcenter
Productivity: !
Metrics API for your own code
https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/metrics/ColumnFamilyMetrics.java https://dropwizard.github.io/metrics/3.1.0/
Productivity - Instrumentation via Metrics API: !
Run Riemann locally
http://riemann.io/
Productivity: !
Trace Frequently
Productivity - Tracing: !
Trace per query via cqlsh
http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/tracing_r.html
cqlsh> tracing on; Now tracing requests. cqlsh> SELECT doc_version FROM data.documents_by_version ... WHERE application_id = myapp ... AND document_id = foo ... AND chunk_index = 0 ... ORDER BY doc_version ASC ... LIMIT 1; !
doc_version ------------- 65856 !
!
Tracing session: 46211ab0-2702-11e4-9bcf-8d157d448e6b
Preparing statement | 18:05:44,845 | 192.168.1.197 | 22337 Enqueuing data request to /192.168.1.204 | 18:05:44,845 | 192.168.1.197 | 22504 Sending message to /192.168.1.204 | 18:05:44,847 | 192.168.1.197 | 24498 Message received from /192.168.1.197 | 18:05:44,854 | 192.168.1.204 | 872 Executing single-partition query on documents_by_version | 18:05:44,888 | 192.168.1.204 | 35183 Acquiring sstable references | 18:05:44,888 | 192.168.1.204 | 35459 Merging memtable tombstones | 18:05:44,889 | 192.168.1.204 | 35675 Key cache hit for sstable 2867 | 18:05:44,889 | 192.168.1.204 | 35792 Seeking to partition beginning in data file | 18:05:44,889 | 192.168.1.204 | 35817 …
Preparing statement | 18:05:44,845 | 192.168.1.197 | 22337 Enqueuing data request to /192.168.1.204 | 18:05:44,845 | 192.168.1.197 | 22504 Sending message to /192.168.1.204 | 18:05:44,847 | 192.168.1.197 | 24498 Message received from /192.168.1.197 | 18:05:44,854 | 192.168.1.204 | 872 Executing single-partition query on documents_by_version | 18:05:44,888 | 192.168.1.204 | 35183 Acquiring sstable references | 18:05:44,888 | 192.168.1.204 | 35459 Merging memtable tombstones | 18:05:44,889 | 192.168.1.204 | 35675 Key cache hit for sstable 2867 | 18:05:44,889 | 192.168.1.204 | 35792 Seeking to partition beginning in data file | 18:05:44,889 | 192.168.1.204 | 35817 …
… Merging data from memtables and 8 sstables | 18:05:44,892 | 192.168.1.204 | 38605 Read 1 live and 2667 tombstoned cells | 18:05:54,135 | 192.168.1.204 | 9282428 Enqueuing response to /192.168.1.197 | 18:05:54,136 | 192.168.1.204 | 9283423 Sending message to /192.168.1.197 | 18:05:54,138 | 192.168.1.204 | 9284753 Message received from /192.168.1.204 | 18:05:54,155 | 192.168.1.197 | 9332505 Processing response from /192.168.1.204 | 18:05:54,158 | 192.168.1.197 | 9335372 Request complete | 18:05:54,158 | 192.168.1.197 | 9335592
… Merging data from memtables and 8 sstables | 18:05:44,892 | 192.168.1.204 | 38605 Read 1 live and 2667 tombstoned cells | 18:05:54,135 | 192.168.1.204 | 9282428 Enqueuing response to /192.168.1.197 | 18:05:54,136 | 192.168.1.204 | 9283423 Sending message to /192.168.1.197 | 18:05:54,138 | 192.168.1.204 | 9284753 Message received from /192.168.1.204 | 18:05:54,155 | 192.168.1.197 | 9332505 Processing response from /192.168.1.204 | 18:05:54,158 | 192.168.1.197 | 9335372 Request complete | 18:05:54,158 | 192.168.1.197 | 9335592
!!?!
… Merging data from memtables and 8 sstables | 18:05:44,892 | 192.168.1.204 | 38605 Read 1 live and 2667 tombstoned cells | 18:05:54,135 | 192.168.1.204 | 9282428 Enqueuing response to /192.168.1.197 | 18:05:54,136 | 192.168.1.204 | 9283423 Sending message to /192.168.1.197 | 18:05:54,138 | 192.168.1.204 | 9284753 Message received from /192.168.1.204 | 18:05:54,155 | 192.168.1.197 | 9332505 Processing response from /192.168.1.204 | 18:05:54,158 | 192.168.1.197 | 9335372 Request complete | 18:05:54,158 | 192.168.1.197 | 9335592
Productivity - Tracing: !
Enable traces in the driver
http://www.datastax.com/documentation/developer/java-driver/2.0/java-driver/tracing_t.html
Productivity - Tracing: !
`nodetool settraceprobability`
Productivity - Tracing: !
…then make sure you try it again
with a node down!
Productivity - Tracing: !
Final note on tracing: do it sparingly
Productivity: !
Logging Verbosity can be changed dynamically**
!
!
** since 0.4rc1
http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configLoggingLevels_r.html
Productivity: !
nodetool for developers • cfstats • cfshistograms • proxyhistograms
Productivity - nodetool - cfstats:
cfstats: per-table statistics about size
and performance (single most useful command)
Productivity - nodetool - cfhistograms:
cfhistograms: column count and partition size vs. latency distribution
Productivity - nodetool - proxyhistograms:
proxyhistograms: performance of inter-cluster
requests
Productivity: !
Running Cassandra during development
Productivity - Running Cassandra: !
Local Cassandra • easy to setup • you control it • but then you control it!
Productivity - Running Cassandra: !
CCM • supports multiple versions • clusters and datacenters • up/down individual nodeshttps://github.com/pcmanus/ccm
Productivity - Running Cassandra: !
Vagrant • isolated, controlled environment • configuration mgmt integration • same CM for production!
http://www.vagrantup.com/
server_count = 3 network = '192.168.2.' first_ip = 10 !
servers = [] seeds = [] cassandra_tokens = [] (0..server_count-1).each do |i| name = 'node' + (i + 1).to_s ip = network + (first_ip + i).to_s seeds << ip servers << {'name' => name, 'ip' => ip, 'initial_token' => (2**64 / server_count * i) - 2**63} end
server_count = 3 network = '192.168.2.' first_ip = 10 !
servers = [] seeds = [] cassandra_tokens = [] (0..server_count-1).each do |i| name = 'node' + (i + 1).to_s ip = network + (first_ip + i).to_s seeds << ip servers << {'name' => name, 'ip' => ip, 'initial_token' => (2**64 / server_count * i) - 2**63} end
server_count = 3 network = '192.168.2.' first_ip = 10 !
servers = [] seeds = [] cassandra_tokens = [] (0..server_count-1).each do |i| name = 'node' + (i + 1).to_s ip = network + (first_ip + i).to_s seeds << ip servers << {'name' => name, 'ip' => ip, 'initial_token' => (2**64 / server_count * i) - 2**63} end
chef.json = { :cassandra => {'cluster_name' => 'VerifyCluster', 'version' => '2.0.8', 'setup_jna' => false, 'max_heap_size' => '512M', 'heap_new_size' => '100M', 'initial_token' => server['initial_token'], 'seeds' => "192.168.2.10", 'listen_address' => server['ip'], 'broadcast_address' => server['ip'], 'rpc_address' => server['ip'], 'conconcurrent_reads' => "2", 'concurrent_writes' => "2", 'memtable_flush_queue_size' => "2", 'compaction_throughput_mb_per_sec' => "8", 'key_cache_size_in_mb' => "4", 'key_cache_save_period' => "0", 'native_transport_min_threads' => "2", 'native_transport_max_threads' => "4" }, }
ENCAPSULATE ENVIRONMENTS
Environments: !
Configuration Management is Essential
Environments: !
Laptop to Production with NO
Manual Modifications!
TESTING
Testing:
Use a Naming Scheme !
• *UnitTest.java: no external resources • *ITest.java: uses external resources • *PITest.java: safely parallel “ITest”
Testing:
Tip: wildcards on the CLI
are not a naming schema.
Testing:
Group tests into
logical units (“suites”)
Testing - Suites:
Benefits of Suites: • share test data • share Cassandra instance(s) • build profiles
<profile> <id>short</id> <properties> <env>default</env> </properties> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.16</version> <configuration> <groups>unit,short</groups> <useFile>false</useFile> <systemPropertyVariables> <cassandra.version>${cassandra.version}</cassandra.version> <ipprefix>${ipprefix}</ipprefix> </systemPropertyVariables> </configuration> </plugin> </plugins> </build> </profile>
<profile> <id>short</id> <properties> <env>default</env> </properties> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.16</version> <configuration> <groups>unit,short</groups> <useFile>false</useFile> <systemPropertyVariables> <cassandra.version>${cassandra.version}</cassandra.version> <ipprefix>${ipprefix}</ipprefix> </systemPropertyVariables> </configuration> </plugin> </plugins> </build> </profile>
Testing - Suites:
Using annotations for suites in code
Testing: !
Use Mocks where possible
Testing: !
Unit Integration Testing
Testing:
Verify Assumptions: test failure scenarios
explicitly
Testing - Integration:
Runtime Integrations: • local • in-process • forked-process
Testing - Integration - Runtime:
EmbeddedCassandra
Testing - Integration - Runtime:
ProcessBuilder to fork Cassandra(s)
Testing - Integration - Runtime:
CCMBridge: delegate to CCM
https://github.com/datastax/java-driver/blob/2.1/driver-core/src/test/java/com/datastax/driver/core/CCMBridge.java
Testing - Integration - Runtime:
Vagrant: delegate to vagrant cli
Testing - Integration:
Best Practice: Jenkins should be able to
manage your cluster
Testing - Integration - Best Practices:
Vagrant vs. CCMBridge? !
• choice of style, really • developer integration with CM • what else is in the architecture?
Testing: !
Load Testing Goals • reproducible metrics • catch regressions • test to breakage point
Testing - Load Testing: !
Stress.java (lot’s of changes recently)
Testing - Load Testing: !
CassandraJMeter
https://github.com/Netflix/CassJMeter
Testing - Load Testing: !
Workload recording and playback coming soon
https://issues.apache.org/jira/browse/CASSANDRA-6572
Testing: !
Primary testing goal: Don’t let
cluster behavior surprise you.
Summary: • Go slowly with bite sized chunks • Segment your tests and use build profiles • Monitor and Instrument • Use reference implementation drivers • Control your environments • Verify any assumptions about failures
Thanks. !
Nate McCall @zznate
!
Co-Founder & Sr. Technical Consultant www.thelastpickle.com
#CassandraSummit