24
Cassandra Java APIs Old and New – A Comparison Shahryar Sedghi Toronto Cassandra User Group Sep. 18, 2013

Cassandra Java APIs Old and New – A Comparison

Embed Size (px)

DESCRIPTION

An introductory session at Toronto Cassandra User Group, September 2013

Citation preview

Page 1: Cassandra Java APIs Old and New – A Comparison

Cassandra Java APIsOld and New – A Comparison

Shahryar SedghiToronto Cassandra User GroupSep. 18, 2013

Page 2: Cassandra Java APIs Old and New – A Comparison

#TCUG 2

Who am I?

www.linkedin.com/pub/shahryar-sedghi/1/439/420

@ [email protected]

Founder at

www.parseix.com

• Did some work on IBM Hierarchical databases (IMS DB / DOS DL1) in late 70s early 80s• Worked extensively on IBM’s first (World’s first) relational Database (SQL/DS) in early 80s• Have worked with Oracle and DB2 for years (not as a DBA)• Started working on Cassandra, late 2011 (1.0.5)

@parseix

Page 3: Cassandra Java APIs Old and New – A Comparison

#TCUG 3

Disclaimer• Code samples used here except for Astyanax

(that was just taken from the website) have worked once in a certain release of Cassandra. Only JDBC (modified) and new Java Driver have been tested with Cassandra 1.2

Page 4: Cassandra Java APIs Old and New – A Comparison

#TCUG 4

Agenda• What a Java API for Cassandra needs?• A basic introduction to Cassandra data model• Thrift• Thrift based APIs• Binary Protocol• DATASTAX new Java API

Page 5: Cassandra Java APIs Old and New – A Comparison

#TCUG 5

A Java Database API• Typically used in Java Application Servers– Thread Safe– Connection Pooling

• When used with Cassandra– Tolerates database Machine/Network failure– Load balancing– Reconnects to the failed machine when its back

• Together they should provide a highly available environment for Web apps without an expensive HA investment

Page 6: Cassandra Java APIs Old and New – A Comparison

#TCUG 6

Cassandra Data Model at a GlanceB

A

D

K

B1 Value11 B2 Value12 B3 Value13 B4 Value14

A1 Value21 A2 Value22 A3 Value23

D1 Value51 D2 Value52 D3 Value53 D4 Value54 D5 Value55

• Is a row key, by default (best practice) it is not sorted, it is sorted by hash of the Key• All columns of one row reside in one node• Is a column name, 2 billion distinct column names can be in one row• Columns are sorted by column name (Ascending or Descending)• Is a column value, it can be null or can be a different type for each column in each row. E.G. A1 can be an Integer and D1 can be a String• If all 1s and all 2s and all 3s, … (e.g., A1,B1, C1) column values carry the same data type, it can be used like a relational DB with CQL 2, better scalability and less functionality, but not the best use of Cassandra

C C1 Value61 C2 Value62

D51 Value551 D52 Value552 D53 Value553

Super Column (Deprecated)

Page 7: Cassandra Java APIs Old and New – A Comparison

#TCUG 7

Data Model -Composite Columns

122 11:firstName

• We would like to model the following data structure:{deptartmentId Integer, employeeId Integer, firtName String, lastName String}

11:lastName 12:firstName 12:lastName 13:firstName 13:lastName

departmentId 122, employeeId 11, 12 and 13

225 17:firstName 17:lastName 19:firstName 19:lastName

departmentId 225, employeeId 17 and 19

• CQL3 create table department(

departmentid int, employeeid int, firstname text, lastname text,PRIMARY KEY (departmentid , employeeid)

);• departmentId is called Partition key• employeeId is called Clustering key

Logical Row

Physical Row

Page 8: Cassandra Java APIs Old and New – A Comparison

#TCUG 8

Thrift• An Apache Project• YaRPC (Remote Procedure Call)• Has an IDL (Interface Definition Language) like other RPCs• Language Neutral• Easier than many others to use• Good fit for early releases of Cassandra to support all

sorts of clients– Apparently not every client works as well as Java and Python

• Is RPC a good fit for database interaction? Yes and no• Cassandra thrift by default listens on 9160

Page 9: Cassandra Java APIs Old and New – A Comparison

#TCUG 9

Thrift Importance for Cassandra• Any Clients, except new DATASTAX drivers for Java

and .NET are using Thrift underneath– Including Hector, JDBC and Astyanax

• Supports – Ring Discovery– Native access to Cassandra– CQL 2– CQL 3

• JDBC and Astyanax may move to native driver in the future

Page 10: Cassandra Java APIs Old and New – A Comparison

#TCUG 10

Thrift Example: Ring DiscoveryTtransport transport = new TFramedTransport(new

TSocket(“192.168.1.14", 9160));TProtocol protocol = new TBinaryProtocol(transport);client = new Cassandra.Client(protocol);transport.open();List<TokenRange> trList = client.describe_ring(“mydb");TokenRange tr = trList.get(0);for(String endpoint: tr.getEndpoints()){

System.out.println(endpoint);}

Page 11: Cassandra Java APIs Old and New – A Comparison

#TCUG 11

Thrift Example: Get All Row KeysColumnParent columnParent = new ColumnParent(“xyz");SlicePredicate predicate = new SlicePredicate();predicate.setSlice_range(new SliceRange(ByteBuffer.wrap(new

byte[0]), ByteBuffer.wrap(new byte[0]), false, 1)); // Here you can specify a slice

KeyRange keyRange = new KeyRange(); //Get all keys, or set a rangeList<KeySlice> keySlices = client.get_range_slices(columnParent,

predicate, keyRange, ConsistencyLevel.ONE); // or null in this caseArrayList<Integer> list = new ArrayList<Integer>();for (KeySlice ks : keySlices) { list.add(ByteBuffer.wrap(ks.getKey()).getInt()); System.out.println(ByteBuffer.wrap(ks.getKey()).getInt());}

Page 12: Cassandra Java APIs Old and New – A Comparison

#TCUG 12

Hector• Most Commonly used Java API for Cassandra• Using Thrift underneath• Among the other features:– Connection Pooling– Ring Discovery and automatic Failover– automatic retry of downed hosts – automatic discovery of additional hosts in the

cluster – suspension of hosts for a short period of time

after several timeouts

Page 13: Cassandra Java APIs Old and New – A Comparison

#TCUG 13

Hector Example: Read All RowKeysCluster myCluster = HFactory.getOrCreateCluster(" MyCluster ", "127.0.0.1:9160");ConfigurableConsistencyLevel ccl = new ConfigurableConsistencyLevel();ccl.setDefaultReadConsistencyLevel(HConsistencyLevel.ONE);Keyspace myKeyspace = HFactory.createKeyspace(("MYDB", , myCluster, ccl);RangeSlicesQuery<Integer, Composite, String> rangeSlicesQuery =

HFactory.createRangeSlicesQuery(myKeyspace, IntegerSerializer.get(), CompositeSerializer.get(), StringSerializer.get());QueryResult<OrderedRows<Integer, Composite, String>> result =rangeSlicesQuery.setColumnFamily(CF).setKeys(0, -1).setReturnKeysOnly().execute();OrderedRows<Integer, Composite, String> orderedRows = result.get();ArrayList<Integer> list = new ArrayList<Integer>();for(Row<Integer, Composite, String> row: orderedRows){ list.add(row.getKey());}

Page 14: Cassandra Java APIs Old and New – A Comparison

#TCUG 14

Astyanax• Developed by Netflix• Supports all Hector functions, much easier• Much better connection pool and failover than Hector• More than an API for Cassandra

– Provides some database functionality at the API level, called Recipes• Parallel all rows query• Message Queue• Chunked Object Store• many more

• Utilities– JSON Writer, CVS Importer

• Netflix expressed the plan to move to binary protocol at Cassandra Summit 2013

Page 15: Cassandra Java APIs Old and New – A Comparison

#TCUG 15

Astyanax Example: Pagination ColumnList<String> columns; int pageize = 10; try {

RowQuery<String, String> query = keyspace .prepareQuery(CF_STANDARD1) .getKey("A") .setIsPaginating() .withColumnRange(new RangeBuilder().setMaxSize(pageize).build()); while (!(columns = query.execute().getResult()).isEmpty()) {

for (Column<String> c : columns) {// do something like c.getStringValue()

} }

} catch (ConnectionException e) { }

Page 16: Cassandra Java APIs Old and New – A Comparison

#TCUG 16

JDBC(Java Database Connectivity)• Standard Java Database API• Only supports CQL to access Cassandra• Current Cassandra JDBC driver is a shallow

implementation of JDBC on top of Thrift• URL is like:– jdbc:cassandra://192.168.1.5:9160?version=3.0.0

• All Java Application Servers support connection pooling for JDBC

• No database failover and Cassandra Cluster support• Helps to convert relational database apps to Cassandra

Page 17: Cassandra Java APIs Old and New – A Comparison

#TCUG 17

JDBC Example: Insert• This code can run in a Servlet or an “EJB”!!! with some minor

modification• Nothing in this code points to Cassandra or Thrift classes• insertQuery for CQL is not always as simple as this Context envCtx = (Context) new InitialContext().lookup("java:comp/env");DataSource datasource = (DataSource) envCtx.lookup("jdbc/cassandra");Connection cqlCon = datasource.getConnection();String insertQuery = "INSERT INTO department(departmentid, employeeid, firstname, lastname) VALUES ( ?, ?, ? )";PreparedStatment statement = cqlCon.prepareStatement(insertQuery);statement.setInt(1, 122);statement.setInt(2, 11);statement.setString(3, "John");statement.setString(4, "Doe");statement.close();cqlCon.close();

Page 18: Cassandra Java APIs Old and New – A Comparison

#TCUG 18

Cassandra Binary Protocol• Inherently asynchronous

– Can be used synchronously as well

• Frame and stream based– Many Request with different Stream id can be sent asynchronously – A set of frames belong to the same stream coming from the server

• Certain events are pushed from the server– Topology change– Status Change– Schema change

• Because of the asynchronous nature, can easily be integrated with new technologies like WebSockets and Servlet 3.0, 3.1

• Listens on port 9042

Page 19: Cassandra Java APIs Old and New – A Comparison

#TCUG 19

DATASTAX Java Driver• Implements the Binary Protocol client side• Similar to JDBC but easier in certain areas

– Specific to Cassandra, not portable• Supports CQL and plan to support OO and DB APIs• Supports

– Query Builder (who wants this?)– Node Discovery– Connection pooling– Reconnection policies– Load balancing policies – Retry policies

• Cursor support announced during Cassandra Summit 2013

Page 20: Cassandra Java APIs Old and New – A Comparison

#TCUG 20

DATASTAX Java Driver : Cluster and Session

Cluster cluster = Cluster.builder().addContactPoint("192.168.1.14","192.168.1.15").withRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE).withReconnectionPolicy(new

ConstantReconnectionPolicy(1000L)).withLoadBalancingPolicy(new

DCAwareRoundRobinPolicy("DC1")).withCredentials("myuser", "mypassword“).build();Session session = cluster.connect(("mykeyspace"));

Page 21: Cassandra Java APIs Old and New – A Comparison

#TCUG 21

DATASTAX Java Driver Example: SelectString selectQuery = "select * from department where departmentid = ? ";PreparedStatment statement = session.prepare(selectQuery);statement.setConsistencyLevel(ConsistencyLevel.ONE); BoundStatement query = statement.bind(122);ResultSet result = session.execute(query); // you can do async here and

// get a Future insteadfor(Row row:result){

System.out.println(row.getInt("employeeid"));System.out.println(row.getString(“firstname"));System.out.println(row.getString(“lastname"));

}

Page 22: Cassandra Java APIs Old and New – A Comparison

#TCUG 22

References• Thrifthttp://wiki.apache.org/cassandra/ThriftExamples• Hectorhttp://hector-client.github.io/hector/build/html/index.html• Astyanaxhttps://github.com/Netflix/astyanax/wiki• JDBChttp://code.google.com/a/apache-extras.org/p/cassandra-jdbc/

• DATASTAX Java Driverhttp://www.datastax.com/documentation/developer/java-driver/1.0/webhelp/index.html

– YouTube Presentation, Cassandra Summit 2013http://www.youtube.com/watch?v=fZfLQABJxuc– Slideshare, Cassandra Summit 2013http://www.slideshare.net/planetcassandra/cassandra-summit-data-stax-java-driver– Mailing [email protected] , enroll at http://cassandra.apache.org/

Page 23: Cassandra Java APIs Old and New – A Comparison

#TCUG 23

Thanks

And especially Victor Anjos

Page 24: Cassandra Java APIs Old and New – A Comparison

#TCUG 24

?