84
More Cloud Technologies Docker, Neo4j

Cloud Infrastrukturen Folienset 7 - Docker - Neo4j

Embed Size (px)

Citation preview

More Cloud Technologies Docker, Neo4j

Docker

What is Docker?

• Docker Engine +

• Docker Hub

Docker Engine =

Portable, lightweight runtime and packing tool

Docker Hub =

Cloud service for sharing application and automation

workflows

Why is Docker relevant?

Receives a lot of attention.

Docker is not a Cloud but it’s an interesting

building block technology.

How does Docker work?

Docker Architecture

<< VM / Physical >> Host

Container 1

Container 2

Container 3

Docker Client $> docker run …

Docker Daemon

Docker Images

• Read-only (filesystem) templates

• OS + Software, e.g. Ubuntu + Nginx

• Used to create Docker containers

• Docker makes image handling easy

Unions FS

• Stackable unification file system

• Can merge contents of several directories (branches) while keeping them physically dedicated

• Allows to mix read-only and writable branches

• Branches can be inserted/deleted anywhere in the tree

• Handles

• elimination of duplicates

• partial-error conditions

Stack files and directories (branches) of several

filesystems together to form a single coherent filesystem.

Docker Registries

• A Docker registry holds Docker images

• You upload / download images

• Can be public or private

• Docker hub is a public Docker registry

• Access images of other Docker users (Community)

Docker Containers

• Contains what an app needs to run

• Created from a Docker image

• States: run, started, stopped, moved, deleted

• Isolated runtime environment (~= OS level virtualized VM)

• Docker containers use Union FS to add layers to „version“ your container’s filesystem

• Copy-on-write (COW) approach

• Starting with a base image, a container can be developed step by step

• The „Dockerfile“ contains these steps

• Image + Docker Image = Final Image

Docker Container =

OS (image) + user files + meta-data

Running a Docker Container

docker run -i -t ubuntu /bin/bash

• Docker Client command is run

• Talking to the Docker daemon

• Start a container from the „ubuntu“ image

• Inside the container run the command „/bin/bash“

• Pulls the ubuntu image

• Creates a new container

• Allocates fs and mounts rw-layer

• Allocates a net / bridge interface

• Setup IP address

• Executes process (/bin/bash)

• Captures and provides app output

Docker Container Isolation

• Namespaces

• Control Groups

• Union fs

• Container format

Namespaces

Namespaces are a linux feature.

„A namespace wraps a global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource. Changes to the global resource are visible to other processes that are members of the namespace, but are invisible to other processes. One use of namespaces is to implement containers.“

- http://man7.org/linux/man-pages/man7/namespaces.7.html

Namespace Isolates

IPC System V IPC, POSIX message queues

Network Network devices, stacks, ports, etc.

Mount Mount points

PID Process IDs

User User and group IDs

UTS Hostname and NIS domain name

Control Groups

Control Groups are a feature of the linux kernel.

„Control Groups provide a mechanism for aggregating/partitioning sets of

tasks, and all their future children, into hierarchical groups with specialized behaviour.“

- https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt

Union FS

Container Format

• container format = Namespaces + Control Groups + Union fs

• Libcontainer = default container format

• https://github.com/docker/libcontainer

• Alternative container format

• LXC

• BSD Jails (future)

• Solaris Zones (future)

What do I need Docker for?

• Build your own application platform / PaaS.

• Want to stay in control of what happens inside your containers (in contrast to Cloud Foundry where you delegate this).

Show me a Docker demo!

https://www.docker.com/tryit/

Neo4j

What is Neo4j?

Neo4j is a graph database.

• Stores data in a graph rather than tables.

• A graph contains nodes.

• Nodes have key, value properties.

• Nodes can be labeled to group them

• Relations = links between nodes

• Relations can be traversed bi-directionally (even when being directed)

• Relation can have properties, too

http://neo4j.com/docs/stable/what-is-a-graphdb.html

• Graphs can be queried (traversed)

• Indexes look-up nodes or relationships

• Find nodes with specific properties faster than traversing the graph.

Why is Neo4j relevant?

• Because graphes are

• Think of Facebook’s social graph with its social search

How does Neo4j work?

Nodes

• Fundamental unit to form a graph

• Can have properties

• Often used to represent entities (although - depending on the domain model - also relations could be meaningful to do so)

• Can have 0..* labels

Relationships

• Fundamental unit to form a graph

• Link nodes

• Can have properties

• Relations can be traversed bi-directionally (even when being directed)

• Reflexive relations are allowed

• Relation can be typed > similar to labels for Nodes

Properties

• Properties = key, value pairs

• Key = string

• Value = primitive || array

• Types: boolean, byte, short, int, long, float, double, char, String

Labels

• Labels are used to group nodes

• Nodes with the same label belong to the same set > Can be used in queries

• A node may have 0..* labels

• Used when defining constraints/indices.

Paths

• A path is one or more nodes connected by relations

• Typically retrieved as a query or traversal result

Traversals

• Traversing

• = visiting a graph’s nodes

• = following nodes according to specific rules

• Most likely only a sub-graph is traversed

• Cypher = declarative way to query a graph by traversal and other techniques

• Traversal Framework Java API

• http://neo4j.com/docs/stable/tutorial-traversal-java-api.html

• Explicit graph traversal

Schema

• Neo4j is a schema-optional graph database

• Can be used without a schema

• Schemas can produce performance and modelling benefits

• Indices

• Performance increase > Nodes can be lookup up faster

• Are eventually available > indices are being populated in the background

• Contraints

• Rules how data should look like

• Violations will cause Neo4j deny the concerning changes

Querying Data The Cypher Query Language

• = Declarative, SQL-inspired language to describe graph patterns

• Describe what to select (e.g. sub-graphs), insert, update or delete

• without the need on how exactly this happens (graph theory ninja magic)

(a) - [:LIKES] - (b)A BLIKES

What do I need Neo4j for?

When you need to apply graph theory on large data

sets.

Show me a Neo4j demo!

http://neo4j.com/docs/stable/cypherdoc-linked-

lists.html

Thank you.

@fischerjulian [email protected]

Links & Sources

• neo4j.com

• docker.com

• http://unionfs.filesystems.org/

• mesos.apache.org

• spark.apache.org