Seminar:BigData, NoSQL graph database for Java developers*
Presenter: Evgeny Hanikblum
Data is getting bigger:“Every 2 days we create as much information as we did up to 2003”
– Eric Schmidt, Google
Big Data Technologies
NoSQL Overview
NoSQL->Not Only SQL
Key Value Stores
• Most Based on Dynamo: Amazon Highly Available Key-Value Store
• Data Model: – Global key-value mapping– Big scalable HashMap– Highly fault tolerant (typically)
• Projects:
Key Value Stores
• Pros:– Simple data model– Scalable
• Cons– Create your own “foreign keys”– Poor for complex data
Column Databases• Most Based on BigTable: Google’s Distributed
Storage System for Structured Data• Data Model: – A big table, with column families– Map Reduce for querying/processing
• Projects:
Column Databases• Pros:– Supports Simi-Structured Data– Naturally Indexed (columns)– Scalable
• Cons– Poor for interconnected data
Document Databases
• Data Model: – A collection of documents– A document is a key value collection– Index-centric, lots of map-reduce
• Projects :
Document Databases
• Pros:– Simple, powerful data model– Scalable
• Cons– Poor for interconnected data– Query model limited to keys and indexes– Map reduce for larger queries
Graph Databases• Data Model: – Nodes and Relationships
• Projects:
Graph Databases• Pros:– Powerful data model, as general as RDBMS– Connected data locally indexed– Easy to query
• Cons– Sharding ( lots of people working on this)• Scales UP reasonably well
– Requires rewiring your brain
Why you need GraphDB?
GraphDB Overview
Because of Data expanded into relationships
GraphDB Overview
Because of Data became interconnected
When should I use it?
Use graph db, if you should deal with something like this :
or this …
or this …
Data is more connected:• Text (content)• HyperText (added pointers)• RSS (joined those pointers)• Blogs (added pingbacks)• Tagging (grouped related data)• RDF (described connected data)• GGG (content + pointers + relationships +
descriptions)
GraphDB Overview
Data is less structured:
• If you tried to collect all the data of every movie ever made, how would you model it?
• Actors, Characters, Locations, Dates, Costs, Ratings, Showings, Ticket Sales, etc.
GraphDB Overview
What is Graph
What is Graph
• An abstract representation of a set of objects where some pairs are connected by links.
Object (Vertex, Node)
Link (Edge, Arc, Relationship)
Different Kinds of Graphs• Undirected Graph• Directed Graph
• Pseudo Graph• Multi Graph
• Hyper Graph
More Kinds of Graphs
• Weighted Graph
• Labeled Graph
• Property Graph
What is Graph DB
What is a Graph DB?
• A database with an explicit graph structure• Each node knows its adjacent nodes • As the number of nodes increases, the cost
of a local step (or hop) remains the same• Plus an Index for lookups
Compared to Relational DatabasesOptimized for aggregation Optimized for connections
What is Neo4j?
What is Neo4j?
• A java based graph database• Property Graph• Full ACID (atomicity, consistency, isolation, durability)• High Availability (with Enterprise Edition)• 32 Billion Nodes, 32 Billion Relationships,
64 Billion Properties• Embedded Server• REST API
• Both nodes and relationships can have metadata. • Integrated pattern-matching-based query language (“Cypher”). • Also the “Gremlin” graph traversal language can be used. • Indexing of nodes and relationships. (Lucene) • Nice self-contained web admin. • Advanced path-finding with multiple algorithms. • Optimized for reads. • Has transactions (in the Java API)• Scriptable in Groovy• Online backup, advanced monitoring and High Availability is
AGPL/commercial licensed
What is Neo4j?
Neo4j is good for :• Highly connected data (social networks)• Recommendations (e-commerce)• Path Finding (how do I know you?)
• A* (Least Cost path)• Data First Schema (bottom-up, but you still
need to design)
how do I know you?
how can I get there ?
If you’ve ever• Joined more than 7 tables together• Modeled a graph in a table• Written a recursive CTE• Tried to write some crazy stored procedure
with multiple recursive self and inner joins
You should use Neo4j
rewiring you brain
name
code
word_count
Language
name
code
flag_uri
Country
IS_SPOKEN_IN
as_primary
language_code
language_name
word_count
Language
country_code
country_name
flag_uri
Country
language_code
country_code
primary
LanguageCountry
name: “Canada”
languages_spoken: “[ ‘English’, ‘French’ ]”
name: “Canada”
language:“English”
language:“Frech”
spoken_in
spoken_in
name: “USA”
name: “France”
spoken_in
spoken_in
rewiring you brain
name
flag_uri
language_name
number_of_words
yes_in_langauge
no_in_language
currency_code
Country
USES_CURRENCY
name
flag_uri
Country
name
number_of_words
yes
no
Language
SPEAKS
code
name
Currency
rewiring you brain
show me the code!
GraphDatabaseService graphDb = new EmbeddedGraphDatabase("var/neo4j");
Node david = graphDb.createNode();Node andreas = graphDb.createNode();
david.setProperty("name", "David Montag");andreas.setProperty("name", "Andreas Kollegger");
Relationship presentedWith = david.createRelationshipTo(andreas,
PresentationTypes.PRESENTED_WITH);
presentedWith.setProperty("date", System.currentTimeMillis());
Neo4j data browser
Neo4j data browser
Neoclipse
console.neo4j.org
Try it right now: start n=node(*) match n-[r:LOVES]->m return n, type(r), mNotice the two nodes in red, they are your result set.
Spring-Data-Neo4J
• Focus on Spring Data Neo4j• VMWare is collaborating with Neo Technology, the
company behind the Neo4j graph database.• Improved programming model: Annotation-based
programming model for applications with rich domain models
• Cross-store persistence: Extend existing JPA application with NoSQL persistence
• Tagging (grouped related data)• RDF (described connected data)
Spring-Data-Neo4J
@NodeEntity
Spring-Data-Neo4J
@NodeEntitypublic class Actor {
private String name;private int age;private HairColor hairColor;private transient String
nickname;
}
Spring-Data-Neo4J
@NodeEntity public class Movie {
@GraphId Long id;
@Indexed(type = FULLTEXT, indexName = "search") String title;
Person director;
@RelatedTo(type="ACTS_IN", direction = INCOMING) Set<Person> actors;
@RelatedToVia(type = "RATED") Iterable<Rating> ratings;
@Query("start movie=node({self}) match movie-->genre<--similar return similar") Iterable<Movie> similarMovies; }
@RelationshipEntity
Spring-Data-Neo4J
@RelationshipEntitypublic class Role {
@StartNodeprivate Actor actor;@EndNodeprivate Movie movie;privateString roleName;
}
Spring-Data-Neo4J
@RelationshipEntitypublic class Role {
@StartNode private Actor actor;@EndNode private Movie movie;
private String roleName;
}
@NodeEntitypublic class Actor {
@RelatedToVia(type = “ACTS_IN”)private Iterable<Role> roles;
}
How they did that ?
NoSql->Graph DB->Neo4JLecturer : Evgeny Hanikblum @ AlphaCSP:OracleWeek2012:Israel Email : [email protected]