94
Introduction to Spring Data Dr. Mark Pollack

Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

Embed Size (px)

Citation preview

Page 1: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

Introduction to Spring DataDr. Mark Pollack

Page 2: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• The current data landscape• Project Goals• Project Tour

Agenda

2

Page 3: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

Enterprise Data Trends

3

Page 4: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

Enterprise Data Trends

4

Unstructured Data•No predefined data model•Often doesn’t fit well in RDBMS

Pre-Aggregated Data•Computed during data collection•Counters•Running Averages

Page 5: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Value from Data Exceeds Hardware & Software costs

• Value in connecting data sets– Grouping e-commerce users by user agent

The Value of Data

5

Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en) AppleWebKit/418.9 (KHTML, like Gecko) Safari/419.3

Page 6: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Extremely difficult/impossible to scale writes in RDBMS– Vertical scaling is limited/expensive– Horizontal scaling is limited or requires $$

• Shift from ACID to BASE– Basically Available, Scalable, Eventually Consistent

• NoSQL datastores emerge as “point solutions”– Amazon/Google papers– Facebook, LinkedIn …

The Data Revolution

6

Page 7: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

NoSQL

7

“Not Only SQL”NOSQL \no-seek-wool\ n. Describes ongoing trend

where developers increasingly opt for non-relational databases to help solve their problems, in an effort to use the right tool for the right job.

Query Mechanisms:

Key lookup, map-reduce, query-by-example, query language, traversals

Page 8: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• “Big data” refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.

• A subjective and moving target. • Big data in many sectors today range from 10’s of TB to

multiple PB

Big Data

8

Page 9: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

Reality Check

9

Page 10: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

Reality Check

10

Page 11: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

Project Goals

11

Page 12: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Data access landscape has changed considerably• RDBMS are still important and predominant

– but no longer considered a “one size fits all” solution

• But they have limitations– Hard to scale

• New data access technologies are solving problems RDBMS can’t– Higher performance and scalability, different data models– Often limited transactional model and relaxed consistency

• Polyglot persistence is becoming more prevalent– Combine RDBMS + other DBs in a solution

Spring Data - Background and Motivation

12

Page 13: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Spring has always provided excellent data access support– Transaction Management– Portable data access exception hierarchy– JDBC – JdbcTemplate– ORM - Hibernate, JPA, JDO, Ibatis support– Cache support (Spring 3.1)

• Spring Data project started in 2010• Goal is to “refresh” Spring’s Data Access support

– In light of new data access landscape

Spring and Data Access

Page 14: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

Spring Data Mission Statement

14

89% of all virtualized applications in the world run on VMware.

Gartner, December 2008“Provides a familiar and consistent

Spring-based programming model for Big Data, NoSQL, and relational stores while retaining store-specific features and capabilities.

Page 15: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

Spring Data Mission Statement

15

89% of all virtualized applications in the world run on VMware.

Gartner, December 2008“Provides a familiar and consistent

Spring-based programming model for Big Data, NoSQL, and relational stores while retaining store-specific features and capabilities.

Page 16: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

Spring Data Mission Statement

16

89% of all virtualized applications in the world run on VMware.

Gartner, December 2008“store-specific

features and capabilities.

Page 17: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Relational– JPA– JDBC Extensions

• NoSQL– Redis– HBase– Mongo– Neo4j– Lucene– Gemfire

• Big Data– Hadoop

• HDFS and M/R• Hive• Pig• Cascading

– Splunk

• Access– Repositories

– QueryDSL

– REST

Spring Data – Supported Technologies

17

Page 18: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Database specific features are accessed through familiar Spring Template pattern– RedisTemplate– HBaseTemplate– MongoTemplate– Neo4jTemplate– GemfireTemplate

• Shared programming models and data access mechanisms– Repository Model

• Common CRUD across data stores

– Integration with QueryDSL• Typesafe query language

– REST Exporter• Expose repository over HTTP in

a RESTful manner.

Spring Data – Have it your way

Page 19: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

Project Tour

19

Page 20: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

JDBC and JPA

Page 21: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Fast Connection Failover

• Simplified configuration for Advanced Queuing JMS support and DataSource

• Single local transaction for messaging and database access

• Easy Access to native XML, Struct, Array data types

• API for customizing the connection environment

Spring Data JDBC Extensions – Oracle Support

Page 22: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

QueryDSL

22

“Enables the construction of type-safe SQL-like queries for multiple backends including JPA, JDO, MongoDB, Lucence, SQL and plain collections in Java

http://www.querydsl.com/ - Open Source, Apache 2.0

Page 23: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Using strings is error-prone• Must remember query syntax, domain classes, properties

and relationships• Verbose parameter binding by name or position• Each back-end has its own query language and API• Note: .NET has LINQ

Problems using Strings for a query language

Page 24: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Code completion in IDE• Almost no syntactically invalid queries allowed• Domain types and properties can be references safely (no

Strings)• Helper classes generated via Java annotation processor• Much less verbose than JPA2 Criteria API

QueryDSL Features

24

QCustomer customer = QCustomer.customer;JPQLQuery query = new JPAQuery(entityManger)Customer bob = query.from(customer) .where(customer.firstName.eq(“Bob”) .uniqueResult(customer)

Page 25: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Incorporate code-generation into your build process– To create a query meta-model of domain classes or Tables (JDBC)

• For SQL

Using QueryDSL for JDBC

QAddress qAddress = QAddress.address;

SQLTemplates dialect = new HSQLDBTemplates();

SQLQuery query = new SQLQueryImpl(connection, dialect).from(qAddress).where(qAddress.city.eq("London"));

List<Address> results = query.list(new QBean<Address>(Address.class, qAddress.street, qAddress.city,

qAddress.country));Querydsl Predicate

Page 26: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Wrapper around JdbcTemplate that supports– Using Querydsl SQLQuery classes to execute queries

– Integrates with Spring’s transaction management

– Automatically detects DB type and set SQLTemplates dialect

– Spring RowMapper and ResultSetExtractors for mapping to POJOs

– Executing insert, updates and deletes with Querdsl’s SQLInsertClause, SQLUpdateClause, and SQLDeleteClause

Spring JDBC Extension – QueryDslJdbcTemplate

Page 27: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

Spring JDBC Extension – QueryDslJdbcTemplate

// Query with joinQCustomer qCustomer = QCustomer.customer;SQLQuery findByIdQuery = qdslTemplate.newSqlQuery()

.from(qCustomer)

.leftJoin(qCustomer._addressCustomerRef, qAddress)

.where(qCustomer.id.eq(id));

Page 28: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

JPA and Repositories

28

Page 29: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

Repositories

29

Mediates between the domain and data mapping layers using a collection-like interface for accessing domain objects.http://martinfowler.com/eaaCatalog/repository.html

Page 30: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• We remove the busy work of developing a repository

Spring Data Repositories

30

Page 31: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

For Example…public interface CustomerRepository {

Customer findOne(Long id);

Customer save(Customer customer);

Customer findByEmailAddress(EmailAddress emailAddress);}

@Entitypublic class Customer {

@Id @GeneratedValue(strategy = GenerationType.AUTO) private Long id; @Column(unique = true) private EmailAddress emailAddress;

@OneToMany(cascade = CascadeType.ALL, orphanRemoval = true) @JoinColumn(name = "customer_id") private Set<Address> addresses = new HashSet<Address>(); // constructor, properties, equals, hashcode omitted for brevity}

Page 32: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

Traditional JPA Implementation@Repositorypublic class JpaCustomerRepository implements CustomerRepository {

@PersistenceContext private EntityManager em;

@Override public Customer findOne(Long id) { return em.find(Customer.class, id); }

public Customer save(Customer customer) { if (customer.getId() == null) { em.persist(customer); return customer; } else { return em.merge(customer); } }

...

Page 33: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

Traditional JPA Implementation. . .

@Override public Customer findByEmailAddress(EmailAddress emailAddress) {

TypedQuery<Customer> query = em.createQuery("select c from Customer c where c.emailAddress = :email", Customer.class); query.setParameter("email", emailAddress);

return query.getSingleResult(); }}

Page 34: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• A simple recipe1. Map your POJO using JPA

2. Extend a repository (marker) interface or use an annotation

3. Add finder methods

4. Configure Spring to scan for repository interfaces and create implementations

• Inject implementations into your services and use as normal…

Spring Data Repositories

Page 35: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

Spring Data Repository Example

or

public interface CustomerRepository extends Repository<Customer, Long> { // Marker Interface

Customer findOne(Long id);

Customer save(Customer customer);

Customer findByEmailAddress(EmailAddress emailAddress);}

@RepositoryDefinition(domainClass=Customer.class, idClass=Long.class)public interface CustomerRepository { . . . }

Page 36: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Boostratp with JavaConfig

• Or XML

• And Spring will create an implementation the interface

Spring Data Repository Example

@Configuration@EnableJpaRepositories@Import(InfrastructureConfig.class)public class ApplicationConfig {

}

<jpa:repositories base-package="com.oreilly.springdata.jpa" />

Page 37: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Wire into your transactional service layer as normal

Spring Data JPA - Usage

Page 38: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• How does findByEmailAddres work…

Query Method Keywords

Page 39: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

Spring Data Repositories - CRUD

39

public interface CrudRepository<T, ID extends Serializable> extends Repository<T, ID> {

T save(T entity);

Iterable<T> save(Iterable<? extends T> entities);

T findOne(ID id);

boolean exists(ID id);

Iterable<T> findAll();

long count();

void delete(ID id);

void delete(T entity);

void delete(Iterable<? extends T> entities);

void deleteAll();}

Page 40: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

Paging, Sorting, and custom finders

40

public interface PagingAndSortingRepository<T, ID extends Serializable> extends CrudRepository<T, ID> { Iterable<T> findAll(Sort sort);

Page<T> findAll(Pageable pageable);}

Page 41: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Query methods use method naming conventions– Can override with Query annotation

– Or method name references JPA named query

Spring Data JPA – Customize Query Methods

Page 42: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Specifications using JPA Criteria API• LockMode, override Transactional metadata, QueryHints• Auditing, CDI Integration• QueryDSL support

Spring Data JPA – Other features

42

Page 43: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Easier and less verbose and JPA2 Criteria API– “equals property value” vs. “property equals value”

– Operations via a builder object

Querydsl and JPA

CriteriaBuilder builder = entityManagerFactory.getCriteriaBuilder();CriteriaQuery<Person> query = builder.createQuery(Person.class);Root<Person> men = query.from( Person.class );Root<Person> women = query.from( Person.class );Predicate menRestriction = builder.and(

builder.equal( men.get( Person_.gender ), Gender.MALE ), builder.equal( men.get( Person_.relationshipStatus ),

RelationshipStatus.SINGLE ));

Predicate womenRestriction = builder.and( builder.equal( women.get( Person_.gender ), Gender.FEMALE ),

builder.equal( women.get( Person_.relationshipStatus ),RelationshipStatus.SINGLE ));

query.where( builder.and( menRestriction, womenRestriction ) );

Page 44: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

verus…

Querydsl and JPA

JPAQuery query = new JPAQuery(entityManager);QPerson men = new QPerson("men");QPerson women = new QPerson("women");

query.from(men, women).where(men.gender.eq(Gender.MALE), men.relationshipStatus.eq(RelationshipStatus.SINGLE),

women.gender.eq(Gender.FEMALE), women.relationshipStatus.eq(RelationshipStatus.SINGLE));

Querydsl Predicates

Page 45: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

QueryDSL - Repositories

45

public interface ProductRepository extends Repository<Product,Long>, QueryDslPredicateExecutor<Product> { … }

Product iPad = productRepository.findOne(product.name.eq("iPad"));

Predicate tablets = product.description.contains("tablet");

Iterable<Product> result = productRepository.findAll(tablets);

public interface QueryDSLPredicateExecutor<T> {

long count(com.mysema.query.types.Predicate predicate); T findOne(Predicate predicate);

List<T> findAll(Predicate predicate);

List<T> findAll(Predicate predicate, OrderSpecifier<?>... orders);

Page<T> findAll(Predicate predicate, Pageable pageable);

}

Page 46: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

Tooling Support

46

Page 47: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

Code Tour - JPA

47

Page 48: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

NoSQL Data Models

48

Page 49: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Familiar, much like a hash table• Redis, Riak, Voldemort,…• Amazon Dynamo inspired

Key/Value

49

Page 50: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Extended key/value model– values can also be key/value pairs

• HBase, Cassandra• Google Bigtable inspired

Column Family

Page 51: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Collections that contain semi-structured data: XML/JSON• CouchDB, MongoDB

Document

51

{ id: ‘4b2b9f67a1f631733d917a7b"),’ author: ‘joe’, tags : [‘example’, ‘db’], comments : [ { author: 'jim', comment: 'OK' }, { author: ‘ida', comment: ‘Bad' } ]

{ id: ‘4b2b9f67a1f631733d917a7c"), author: ‘ida’, ...

{ id: ‘4b2b9f67a1f631733d917a7d"), author: ‘jim’, ...

Page 52: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Nodes and Edges, each of which may have properties• Neo4j, Sones, InfiniteGraph

Graph

52

Page 53: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Advanced key-value store• Values can be

– Strings (like in a plain key-value store).

– Lists of strings, with O(1) pop and push operations.

– Sets of strings, with O(1) element add, remove, and existence

test.

– Sorted sets that are like Sets but with a score to take

elements in order.

– Hashes that are composed of string fields set to string values.

Redis

53

Page 54: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Operations– Unique to each data type – appending to list/set, retrieve slice

of a list…

– Many operations performed in (1) time – 100k ops/sec on entry-level hardware

– Intersection, union, difference of sets

– Redis is single-threaded, atomic operations

• Optional persistence• Master-slave replication• HA support coming soon

Redis

54

Page 55: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Provide ‘defacto’ API on top of multiple drivers• RedisTemplate

– Connection resource management

– Descriptive method names, grouped into data type categories• ListOps, ZSetOps, HashOps, …

– No need to deal with byte arrays• Support for Java JDK, String, JSON, XML, and Custom serialization

– Translation to Spring’s DataAccessException hierarchy

• Redis backed Set, List, Map, capped Collections, Atomic Counters

• Redis Messaging

• Spring 3.1 @Cacheable support

Spring Data Redis

55

Page 56: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• List Operations

RedisTemplate

56

@AutowiredRedisTemplate<String, Person> redisTemplate;

Person p = new Person("George", “Carlin");redisTemplate.opsForList().leftPush("hosts", p);

Page 57: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• JDK collections (java.util & java.util.concurrent)– List/Set/(Blocking)Queues/(Blocking)Deque

• Atomic Counters– AtomicLong & AtomicInteger backed by Redis

Redis Support Classes

57

Set<String> t = new DefaultRedisSet<String>(“timeline“, connection);t.add(new Post("john", "Hello World"));

RedisSet<String> fJ = new DefaultRedisSet<String>("john:following", template);RedisSet<String> fB = new DefaultRedisSet<String>("bob:following", template);

// followers in commonSet s3 = fJ.intersect(fB);

Page 58: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

Code Tour - Redis

58

Page 59: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Column-oriented database– Row points to “columns” which are actually key-value pairs

– Columns can be grouped together into “column families”• Optimized storage and I/O

• Data stored in HDFS, modeled after Google BigTable• Need to define a schema for column families up front

– Key-value pairs inside a column-family are not defined up front

HBase

59

Page 60: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

Using HBase

60

$ ./bin/hbase shell> create 'users', { NAME => 'cfInfo'}, { NAME => 'cfStatus' }> put 'users', 'row-1', 'cfInfo:qUser', 'user1'> put 'users', 'row-1', 'cfInfo:qEmail', '[email protected]'> put 'users', 'row-1', 'cfInfo:qPassword', 'user1pwd'> put 'users', 'row-1', 'cfStatus:qEmailValidated', 'true‘> scan 'users'ROW COLUMN+CELLrow-1 column=cfInfo:qEmail, timestamp=1346326115599, [email protected] column=cfInfo:qPassword, timestamp=1346326128125, value=user1pwdrow-1 column=cfInfo:qUser, timestamp=1346326078830, value=user1row-1 column=cfStatus:

Configuration configuration = new Configuration(); // Hadoop configuration objectHTable table = new HTable(configuration, "users");Put p = new Put(Bytes.toBytes("user1"));p.add(Bytes.toBytes("cfInfo"), Bytes.toBytes("qUser"), Bytes.toBytes("user1"));table.put(p);

Page 61: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• HTable class is not thread safe• Throws HBase-specific exceptions

HBase API

61

Configuration configuration = new Configuration(); // Hadoop configuration HTable table = new HTable(configuration, "users");Put p = new Put(Bytes.toBytes("user1"));p.add(Bytes.toBytes("cfInfo"), Bytes.toBytes("qUser"), Bytes.toBytes("user1"));p.add(Bytes.toBytes("cfInfo"), Bytes.toBytes("qEmail"), Bytes.toBytes("[email protected]"));p.add(Bytes.toBytes("cfInfo"), Bytes.toBytes("qPassword"), Bytes.toBytes("user1pwd"));table.put(p);

Page 62: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Configuration support• HBaseTemplate

– Resource Management– Translation to Spring’s DataAccessException hierarchy– Lightweight Object Mapping similar to JdbcTemplate

• RowMapper, ResultsExtractor

– Access to underlying resource• TableCallback

Spring Hadoop - HBase

62

Page 63: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

HBaseTemplate - Configuration

63

<configuration id="hadoopConfiguration"> fs.default.name=hdfs://localhost:9000</configuration>

<hbase-configuration id="hbaseConfiguration" configuration-ref="hadoopConfiguration" />

<beans:bean id="hbaseTemplate" class="org.springframework.data.hadoop.hbase.HbaseTemplate"> <beans:property name="configuration" ref="hbaseConfiguration" /></beans:bean>

Page 64: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

HBaseTemplate - Save

64

public User save(final String userName, final String email, final String password) { return hbaseTemplate.execute(tableName, new TableCallback<User>() { public User doInTable(HTable table) throws Throwable { User user = new User(userName, email, password); Put p = new Put(Bytes.toBytes(user.getName())); p.add(CF_INFO, qUser, Bytes.toBytes(user.getName())); p.add(CF_INFO, qEmail, Bytes.toBytes(user.getEmail())); p.add(CF_INFO, qPassword, Bytes.toBytes(user.getPassword())); table.put(p); return user; } });}

Page 65: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

HBaseTemplate – POJO Mapping

65

private byte[] qUser = Bytes.toBytes("user");private byte[] qEmail = Bytes.toBytes("email");private byte[] qPassword = Bytes.toBytes("password");

public List<User> findAll() { return hbaseTemplate.find(tableName, "cfInfo", new RowMapper<User>() { @Override public User mapRow(Result result, int rowNum) throws Exception { return new User(Bytes.toString(result.getValue(CF_INFO, qUser)), Bytes.toString(result.getValue(CF_INFO, qEmail)), Bytes.toString(result.getValue(CF_INFO, qPassword))); } });}

Page 66: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

Code Tour - HBase

66

Page 67: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Document Database– JSON-style documents– Schema-less

• Documents organized in collections• Full or partial document updates• Index support – secondary and compound• Rich query language for dynamic queries• GridFS for efficiently storing large files• Geo-spatial features• Map/Reduce for aggregation queries

– New Aggregation Framework in 2.2

• Replication and Auto Sharding

MongoDB

67

Page 68: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• MongoTemplate– Fluent Query, Criteria, Update APIs

– Translation to Spring’s DataAccessException hierarchy• GridFSTemplate• Repositories• QueryDSL• Cross-store persistence• JMX• Log4J Logging Adapter

Spring Data - MongoDB

68

Page 69: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

MongoOperations Interface

69

Page 70: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

MongoTemplate - Usage

70

Page 71: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Sample document

• MapFunction – count the occurance of each letter in the array

MongoTemplate - MapReduce

71

{ "_id" : ObjectId("4e5ff893c0277826074ec533"), "x" : [ "a", "b" ] }{ "_id" : ObjectId("4e5ff893c0277826074ec534"), "x" : [ "b", "c" ] }{ "_id" : ObjectId("4e5ff893c0277826074ec535"), "x" : [ "c", "d" ] }

function () { for (var i = 0; i < this.x.length; i++) { emit(this.x[i], 1); }}

Page 72: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Reduce Function – sum up the occurrence of each letter across all docs

• Execute MapReduce

MongoTemplate - MapReduce

72

function (key, values) { var sum = 0; for (var i = 0; i < values.length; i++) sum += values[i]; return sum;}

MapReduceResults<ValueObject> results = mongoOperations.mapReduce("collection", "classpath:map.js",

"classpath:reduce.js", ValueObject.class);

Page 73: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• @Document– Marks an entity to be mapped to a document (optional)

– Allows definition of the collection the entity shall be persisted to

– Collection name defaults to simple class name

• @Id– Demarcates id properties

– Properties with names id and _id auto-detected

Mapping Annotations

73

Page 74: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• @Index / @CompoundIndex– Creates Indexes for one or more properties

• @Field– Allows customizing the key to be used inside the document– Define field order

• @DBRef– Creates references to entities in separate collection– Opposite of embedding entities inside the document (default)

Mapping Annotations

74

Page 75: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Same as before with JPA• Added functionality that is MongoDB specfic

– Geolocation, @Query

Mongo Repositories

75

Page 76: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

Code Tour - Mongo

76

Page 77: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Graph Database – focus on connected data– The social graph…

• Schema-free Property Graph• ACID Transactions• Indexing• Scalable ~ 34 billion nodes and relationships, ~1M/traversals/sec• REST API or embeddable on JVN• High-Availability• Declarative Query Language - Cypher

Neo4j

77

Page 78: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Use annotations to define graph entitles

• Entity state backed by graph database

• JSR-303 bean validation• Query and Traversal API

support• Cross-store persistence

– Part of object lives in RDBMS, other in Neo4j

• Exception translation• Declarative Transaction

Management• Repositories• QueryDSL• Spring XML namespace• Neo4j-Server support

Spring Data Neo4j

78

Page 79: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

Classic Neo4j Domain class

79

Page 80: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

Spring Data Neo4j Domain Class

80

@NodeEntitypublic class Tag { @GraphId private Long id; @Indexed(unique = true) private String name;}

Page 81: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• @NodeEntity– Represents a node in the

graph– Fields saved as properties on

node– Instantiated using Java ‘new’

keyword, like any POJO– Also returned by lookup

mechanisms– Type information stored in the

graph

Spring Data Neo4j Domain class

81

@NodeEntitypublic class Tag { @GraphId private Long id; @Indexed(unique = true) private String name;}

Page 82: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

Spring Data Neo4j Domain Class

82

@NodeEntitypublic class Customer {

@GraphId private Long id; private String firstName, lastName;

@Indexed(unique=true) private String emailAddress;

@RelatedTo(type=“ADDRESS”) private Set<Address> addresses = new HashSet<Address>();}

Page 83: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Resource Management• Convenience Methods• Declarative Transaction Management• Exception Translation to DataAccessException hierarchy• Works also via REST with Neo4j-Server• Multiple Query Languages

– Cypher, Gremlin

• Fluent Query Result Handling

Neo4jTemplate

83

Page 84: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Implicitly creates a Neo4jTemplate instance in the app

Neo4jTemplate - Usage

84

Customer dave = neo4jTemplate.save(new Customer("Dave", "Matthews", "[email protected]"));

Product iPad = neo4jTemplate.save(new Product("iPad", "Apple tablet device").withPrice(499));

Product mbp = neo4jTemplate.save(new Product("MacBook Pro", "Apple notebook").withPrice(1299));

neo4jTemplate.save(new Order(dave).withItem(iPad,2).withItem(mbp,1));

<bean id="graphDatabaseService" class="org.springframework.data.neo4j.rest.SpringRestGraphDatabase"> <constructor-arg value="http://localhost:7474/db/data" /></bean>

<neo4j:config graphDatabaseService="graphDatabaseService" />

Page 85: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Export CrudRepository methods via REST semantics– PUT, POST = save()

– GET = find*()

– DELETE = delete*()

• Support JSON as the first-class data format• JSONP and JSONP+E support• Implemented as Spring MVC application

Spring Data REST

85

Page 86: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Discoverability– “GET /” results in a list of resources available from this level

• Resources are related to one another by “links”– Links have a specific meaning in different contexts

– HTML and Atom synidcation format has <link rel=“” href=“”/>

• Use Spring HATEOAS as basis for creating representations– https://github.com/SpringSource/spring-hateoas

Spring Data REST

86

Page 87: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

Spring Data REST - Example

87

curl -v http://localhost:8080/spring-data-rest-webmvc/{ "links" : [{ "rel" : "person", "href" : "http://localhost:8080/spring-data-rest-webmvc/person" }]}

curl -v http://localhost:8080/spring-data-rest-webmvc/person

{ "content": [ ], "links" : [ { "rel" : "person.search", "href" : "http://localhost:8080/spring-data-rest-webmvc/person/search" } ]}

Page 88: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

Spring Data REST - Example

88

curl -v http://localhost:8080/spring-data-rest-webmvc/person/search

{ "links" : [ { "rel" : "person.findByName", "href" : "http://localhost:8080/spring-data-rest-webmvc/person/search/findByName" } ]}

curl -v http://localhost:8080/spring-data-rest-webmvc/person/search/findByName?name=John+Doe

[ { "rel" : "person.Person", "href" : "http://localhost:8080/spring-data-rest-webmvc/person/1"} ]

Page 89: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

Spring Data REST - Example

89

curl -v http://localhost:8080/spring-data-rest-webmvc/person/1

{ "name" : "John Doe", "links" : [ { "rel" : "profiles", "href" : "http://localhost:8080/spring-data-rest-webmvc/person/1/profiles" }, { "rel" : "addresses", "href" : "http://localhost:8080/spring-data-rest-webmvc/person/1/addresses" }, { "rel" : "self", "href" : "http://localhost:8080/spring-data-rest-webmvc/person/1" } ], "version" : 1}

Page 90: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Hadoop has a poor out of the box programming model

• Applications are generally a collection of scripts calling command line apps

• Spring simplifies developing Hadoop applications

• By providing a familiar and consistent programming and configuration model

• Across a wide range of use cases– HDFS usage– Data Analysis

(MR/Pig/Hive/Cascading)• PigTemplate• HiveTemplate

– – Workflow (Spring Batch)– Event Streams (Spring Integration)

• Allowing you to start small and grow

Spring for Hadoop - Goals

90

Page 91: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

Relationship with other Spring Projects

91

Page 92: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

Books

92

Free Spring Data JPA Chapter – http://bit.ly/sd-book-chapter

O’Reilly Spring Data Book - http://bit.ly/sd-book

Page 93: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

• Spring Data– http://www.springsource.org/spring-data– http://www.springsource.org/spring-hadoop

• Querydsl– http://www.querydsl.com

• Example Code– https://github.com/SpringSource/spring-data-book– https://github.com/SpringSource/spring-data-kickstart– Many more listed on individual project pages

Resources

93

Page 94: Introduction to Spring Data Dr. Mark Pollack. The current data landscape Project Goals Project Tour Agenda 2

Thank You!