56
(MySQL) -[:to]-> (neo4j) A DBA Perspective Dave Stern @davestern1

MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

Embed Size (px)

DESCRIPTION

This session is a walk through and best practices from installation and initial set up, through maintenance and performance tuning, all the way to production use for a series of Neo4j learning opportunities for administrators.

Citation preview

Page 1: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

(MySQL)-[:to]->(neo4j)A DBA Perspective

Dave Stern

@davestern1

Page 2: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013
Page 3: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

Dev Ops @ FiftyThreeMySQL user & admin since 1998

Multiple tiers of masters & slaves

Bare metal & AWS - EC2/RDS

MySQL & Percona

neo4j user & admin since 2012

neo4j 1.8, 1.9

AWS: Multiple 3-instance enterprise clusters

Page 4: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

How do you use MySQL?

Single Instance

Master/Slave, Multi-master

MySQL Cluster

Have you tried neo4j yet?

Page 5: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

Where does FiftyThree useneo4j?

Page 6: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013
Page 7: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013
Page 8: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

Where does FiftyThree useneo4j?

Much more in development...

Page 9: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

What is this talk about?

Comparison

Configuration

Use

Page 10: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

Comparison

Page 11: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

Logical Partitioning

http://www.mysql.com/products/workbench/

MySQLStrictly enforced schema

neo4jNo logical databases

No tables

...no schema

...no joins

2.0: schema-optional

Page 12: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

Physical Partitioning & ShardingImproves write performance, usually disk I/O

MySQLinnodb_file_per_table

Databases on separate partitions or devices

Shard horizontally (e.g. by time range)

Shard vertically (e.g. by table or function)

Logs can be on separate partitions for I/O

gain

neo4jNo logical partitioning by DB or table

Highly connected data: no clear separation

Logs can be on separate partitions for I/O

gain

Page 13: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

SCALE UP!

Page 14: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

Authentication & AuthorizationMySQL

mysql> select Host, db, user, select_priv, insert_priv, update_priv, delete_priv from db;+-----------+---------+-----------+-------------+-------------+-------------+-------------+| Host | db | user | select_priv | insert_priv | update_priv | delete_priv |+-----------+---------+-----------+-------------+-------------+-------------+-------------+| % | test | | Y | Y | Y | Y || % | test\_% | | Y | Y | Y | Y || localhost | Orders | admin | Y | Y | Y | Y || localhost | Events | admin | Y | Y | Y | Y || localhost | Events | events | Y | Y | Y | N || 10.% | Events | events | Y | N | N | N |+-----------+---------+-----------+-------------+-------------+-------------+-------------+

Page 15: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

Authentication & Authorizationneo4j

No permissions

No users

How do you secure the DB?1. Protect the database in a Private Network or VPC2. Firewall: router, AWS Security Groups, iptables3. Proxy requests via web server or Load Balancer

If you must allow access, use HTTPS & authenticate at the proxy.

Page 16: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013
Page 17: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013
Page 18: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

Replication

http://www.mysqlperformanceblog.com/wp-content/uploads/2013/07/23.png

Page 19: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

Replication STOP SLAVE; SET GLOBAL sql_slave_skip_counter = 1; START SLAVE;

Page 20: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

Replication vs. HA

MySQLFree

Slaves pull updates

Eventual consistency

One-way, asynchronous

neo4jEnterprise edition: can cost $depending on use

Slaves can pull asynchronousupdates

Eventual consistency, optimisticpushes to slaves are the default

Writes to any cluster member

Page 21: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

JVMBuffers & Memory management =~ JVM settings

The database itself is extendable via Java

... if you're into that sort of thing

Page 22: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

Built-in ToolsData Browser

Page 23: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013
Page 24: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

Built-in ToolsData BrowserBackup Script

neo4j

$ /opt/neo4j/bin/neo4j-backup -from single://10.66.182.177:6362 \> -to /media/neo4j-backup/production/2013-11-02T05:40:10ZPerforming full backup from 'single://10.66.182.177:6362'............................................[44 Files copied]Full consistency check.................... 10%.................... 20%.................... 30%.................... 40%.................... 50%.................... 60%.................... 70%.................... 80%.................... 90%.................... 100%Done

Page 25: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

Built-in ToolsData BrowserBackup Script

MySQL

$ innobackupex --user=DBUSER --password=DBUSERPASS /path/to/BACKUP-DIR/

innobackupex: Backup created in directory '/path/to/BACKUP-DIR/2013-03-25_00-00-09'innobackupex: MySQL binlog position: filename 'mysql-bin.000003',position 1946111225 00:00:53innobackupex: completed OK!

Page 26: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

Built-in ToolsData BrowserBackup Script

Visual Server Info

Page 27: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013
Page 28: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

ConfigurationMySQL

So many options... mysql> SHOW VARIABLES; +-----------------------------------------+---------------------------+ | Variable_name | Value | +-----------------------------------------+---------------------------+ | auto_increment_increment | 1 | | auto_increment_offset | 1 | | autocommit | ON | | automatic_sp_privileges | ON | | back_log | 50 | | basedir | /home/mysql/bin/mysql-5.5 | | big_tables | OFF | | binlog_cache_size | 32768 | | binlog_direct_non_transactional_updates | OFF | | binlog_format | STATEMENT | | binlog_stmt_cache_size | 32768 | | bulk_insert_buffer_size | 8388608 | ... | max_allowed_packet | 1048576 | | max_binlog_cache_size | 18446744073709547520 | | max_binlog_size | 1073741824 | | max_binlog_stmt_cache_size | 18446744073709547520 | | max_connect_errors | 10 | | max_connections | 151 | | max_delayed_threads | 20 | | max_error_count | 64 | | max_heap_table_size | 16777216 | | max_insert_delayed_threads | 20 | | max_join_size | 18446744073709551615 | ...

Page 29: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

You can optimize dozens of settings like these...

MySQL ConfigurationBuffers, Caching & I/O

innodb_buffer_pool_size = 12Ginnodb_buffer_pool_instances = 8innodb_additional_mem_pool_size = 256M

innodb_flush_log_at_trx_commit = 2innodb_flush_method = O_DIRECTinnodb_log_file_size = 128Minnodb_log_buffer_size = 64M

innodb_file_per_tableinnodb_io_capacity = 500innodb_read_io_threads = 64innodb_write_io_threads = 64

Page 30: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

and these...

MySQL ConfigurationNetwork & Concurrency

table_cache = 2048max_connections = 1000

max_allowed_packet = 16M

Page 31: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

and these...

MySQL ConfigurationReplication

server-id = 2master-host = db-master.mycompany.commaster-port = 3306master-user = usernamemaster-password = passwordmaster-connect-retry = 60

Page 32: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

And these, depending on version & hardware...

MySQL ConfigurationOther

sort_buffer_size = 2Mtmp_table_size = 32M

join_buffer_size = 128k

query_cache_type = 1query_cache_size = 64M

open_files_limit = 8192

....

Page 33: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

neo4j Configuration TuningSimple Questions

How many nodes do you expect?

How many relationships do you expect?

Average number of properties per node and relationship?

Optional: How do you expect to traverse the graph?

Long paths and/or large result sets?

Short paths and/or small results sets?

3 things to calculate:File Cache Mapped Memory & Object Caches

Heap Size

RAM for OS

Page 34: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

neo4j ConfigurationStore file Record size Contents

neostore.nodestore.db 9 B Nodes

neostore.relationshipstore.db 33 B Relationships

neostore.propertystore.db 41 B Properties for nodes andrelationships

neostore.propertystore.db.strings 128 B Values of string properties

neostore.propertystore.db.arrays 128 B Values of array properties

Capacity Planning Estimates:

Node size (9B) x expected nodes (14 B in 2.0)

Relaltionship size (33B) x expected relationships

Property size (41B) x expected properties

Strings & Arrays

Page 35: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

ConfigurationMain config files

neo4j-wrapper.conf

neo4j.properties

neo4j-server.properties

Page 36: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

Configurationneo4j-wrapper.conf

Heap Size

GC method

Page 37: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

Configurationneo4j.properties

File Caches: Mapped memory

Object Caches

Indexes

HA

Backup

Page 38: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

Configurationneo4j-server.properties

HTTP/S

Admin client

REST

Database mode

Logging

Page 39: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

Configuration21.2. Server Configuration

25. Configuration & Performance

Page 40: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

neo4j: Buffers, Caching & I/Oneo4j-wrapper.conf

# Initial Java Heap Size (in MB)wrapper.java.initmemory=1024

# Maximum Java Heap Size (in MB)wrapper.java.maxmemory=1024

Page 41: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

neo4j: Buffers, Caching & I/Oneo4j.properties

Two types of caches: file buffer and object cache

File Buffer Cache:

# Default values for the low-level graph engineneostore.nodestore.db.mapped_memory=25Mneostore.relationshipstore.db.mapped_memory=50Mneostore.propertystore.db.mapped_memory=90Mneostore.propertystore.db.strings.mapped_memory=130Mneostore.propertystore.db.arrays.mapped_memory=130M

Object Cache:

node_cache_size=256Mrelationship_cache_size=256M# optionalnode_cache_array_fraction=5relationship_cache_array_fraction=5

# The GC resistant cache described below is only available in the# Neo4j Enterprise Edition.# cache_type values: soft (default), weak, strongcache_type=gcr

Page 42: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

neo4j: Concurrencyneo4j.properties

# concurrent HTTP requests that the server will service.org.neo4j.server.webserver.maxthreads=64

Page 43: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

neo4j: HAneo4j-server.properties

org.neo4j.server.database.mode=HA

neo4j.properties

ha.server_id=1

ha.initial_hosts=server1:5001,server2:5001#ha.discovery.url=http://example.com/list

#Host & port to bind the cluster management communication.ha.cluster_server=server1:5001

#Hostname and port to bind the HA server.ha.server=my-domain.com:6001

##### Optional cluster strategies ###### Interval of pulling updates from master.ha.pull_interval=10s

#The amount of slaves the master will ask to replicate a committed#transaction.ha.tx_push_factor=1

#Push strategy of a transaction to a slave during commit.ha.tx_push_strategy=fixed # or round_robin

Page 44: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013
Page 45: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

UseFile System

$PATH_TO_NEO4J = /opt/neo4j

/opt/neo4j/bin neo4j neo4j-backup

/opt/neo4j/conf neo4j.properties neo4j-server.properties neo4j-wrapper.conf

/opt/neo4j/data

/opt/neo4j/data/graph.db The actual graph data

/opt/neo4j/data/log All logs

Page 46: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

UseFile System

$PATH_TO_NEO4J = /opt/neo4j

/opt/neo4j/bin (/usr/bin/mysql) neo4j neo4j-backup

/opt/neo4j/conf (/etc/mysql) neo4j.properties neo4j-server.properties neo4j-wrapper.conf

/opt/neo4j/data (/var/lib/mysql)

/opt/neo4j/data/graph.db (/var/lib/mysql/data) The actual graph data

/opt/neo4j/data/log (/var/log/mysql) All logs

Page 47: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

UseIndexes

The database itself is a natural index

Lucene for searches

neo4j 2.0:Nodes have labels: Person, Location, etc. that group them into sets

CREATE INDEX ON :Person(name)

Look familiar?

CREATE INDEX id_index ON Person (id);

Page 48: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

UseIndexesneo4j 2.0:

Properties can have unique constraints

CREATE CONSTRAINT ON (book:Book) ASSERT book.isbn IS UNIQUE

Look familiar?

CREATE UNIQUE INDEX email_index ON Person (email);

Page 49: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

UseIndexes

Current 1.9.x:

Auto indexing (deprecated):

one for nodes, one for relationships

off by default

Page 50: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

UseQuerying

mysql> select * from graph_local limit 10;+----+-------------------+---------+---------------+------------+| id | graph_template_id | host_id | snmp_query_id | snmp_index |+----+-------------------+---------+---------------+------------+| 1 | 12 | 1 | 0 | || 2 | 9 | 1 | 0 | || 3 | 10 | 1 | 0 | || 4 | 8 | 1 | 0 | || 5 | 58 | 2 | 0 | || 6 | 62 | 2 | 0 | || 7 | 53 | 2 | 0 | || 8 | 37 | 2 | 0 | || 9 | 67 | 2 | 0 | || 10 | 65 | 2 | 0 | |+----+-------------------+---------+---------------+------------+10 rows in set (0.00 sec)

Page 51: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

http://www.mysql.com/products/workbench/

Page 52: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013
Page 53: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

Example response:

UseQuerying via REST

POST http://localhost:7474/db/data/cypherAccept: application/json; charset=UTF-8Content-Type: application/json

{ "query" : "start x = node:node_auto_index(name={startName}) match path = (x-[r]-friend) where friend.name = {name} return TYPE(r)", "params" : { "startName" : "I", "name" : "you" }}

200: OKContent-Type: application/json; charset=UTF-8

{ "columns" : [ "TYPE(r)" ], "data" : [ [ "know" ] ]}

Page 54: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

DBA PerspectiveUse the best database for the job, or both

neo4j ships with great tools

neo4j is easier to configure: fewer options, less complex, still flexiblefor optimization

HA more robust and more opaque than basic replication

For better or worse, JVM handles a lot for you

Authorization - it's up to you

Scaling up is easier than changing your data model

Page 55: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

We're [email protected]

Page 56: MySQL to Neo4j: A DBA Perspective - David Stern @ GraphConnect NY 2013

Thank You!Thanks to:

Aseem Kishore @aseemk

Chris Leishman @cleishm

Max De Marzi @maxdemarzi