Upload
sander-van-de-graaf
View
7.914
Download
2
Tags:
Embed Size (px)
DESCRIPTION
http://joind.in/2495 PHPBenelux conference January 2011
Citation preview
CouchDBrelax
CouchDBrelax
Sander van de Graaf@svdgraaf
Focus -> practical usage examples
http://joind.in/talk/view/2495
second talk ever, please provide feedback
CONTENTS
• Introduction
• PHP Usage
• Replication/Scalability
• Backend usage
• Couchapps
•Other stuff
CouchDBrelax
NOSQL
IT’S A MOVEMENT
Movement, definitions vary
1998
Back in the day...
Lame movie 1
Another one
And then some more...
XML was introduced
Some game was published
MC Donald’s Happy Meal
Carlo Strozzi
Released NOSQL open source DB
NOSQL == Not Only SQL
“[The NoSQL movement] departs from the relational model altogether, it should therefore have been called more appropriately ‘NoREL’, or something to that effect.”
- Carlo Strozzi
CouchDBrelax
Ubuntu One, contacts sync
NUTSHELL
SPEED
Speed Not diskpace (see cleanup)
APPEND ONLY
Append only storage, happy cup of coffee!
NO REPAIR NEEDED
COMPACTING
HTTP SERVER
caching, loadbalancing, without extra costs :D
CAP
CAP
CouchDB
EVENTUALLY CONSISTENT
CouchDB
CouchDB focus is on Availability + Reliability, and will be consistent after replication.
FULL REST API
REST
• GET
• PUT
• POST
•DELETE
• COPY
• SELECT
• UPDATE
• INSERT
•DELETE
• ...
JSON{ total_rows: 2, offset: 0, rows: [ { id: '_design/foobar', key: '_design/foobar', value: { rev: '5-982b2fc36835715b2aae54609b5d5f1e' } }, { id: 'f0e1fd96eb6e094f74dda8d949000a6a', key: 'f0e1fd96eb6e094f74dda8d949000a6a', value: { rev: '1-86bca407fce8234a63c90ff549b56b10' } }, ]}
Javascript == awesome! :D
REPLICATION
Key feature, relaxed about replication issues, and version conflicts
Welcome to Futon, I prefer a UI
http-console rocks the socks out of telnet
Berkeley
CONTENTS
• Introduction
• PHP Usage
• Replication/Scalability
• Backend usage
• Couchapps
•Other stuff
PHP USAGE
PHP LIBRARIES
• PHPillow (LGPL)
• PHP Object Freezer (BSD)
• PHP On Couch (GPL 2 / 3)
• PHP CouchDB Extension (PHP license)
• SAG for CouchDB (apache)
•Doctrine 2 CouchDB ODM
All are quite nice, doctrine has some rough edges, I use PHP On Couch with custom patch for Zend autoloader easyness
<?PHP
// setup connection for couchdb$client = new Couchdb_Client('http://ponies.couchone.com:5984','rainbows');
// fetch a document$doc = $client->getDoc('awesome_pony');
// updating document$doc->newproperty = array("type","awesome");
try{ $client->storeDoc($doc);}catch (Exception $e){ echo "Document storage failed : " . $e->getMessage();}
PHP On Couch with small ZF autoloader fix
CONTENTS
• Introduction
• PHP Usage
• Replication/Scalability
• Backend usage
• Couchapps
•Other stuff
REPLICATION
DEFINITION
“Replication is the process of sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.”
Source: wikipedia
CouchDBrelax
CouchDBrelax
CouchDBrelax
CouchDB
CouchDBrelax
CouchDBrelax
CouchDBrelax
CouchDBrelax
Mysql can do this
CouchDBrelax
CouchDBrelax
Master, Master replication
CouchDBrelax
CouchDBrelax
CouchDBrelax
CouchDBrelax
CouchDBrelax
CouchDBrelax
US NL
BE
Not only locally
P2P WEB
“World Domination”
CLUSTERING“The fun stuff ”
Couchdb doesn’t support partitioning (sharding) itself, couchdb -> http based -> lots of possibilities
CouchDBrelax
CouchDBrelax
loadbalancer
...n
The basics are all the same: easy => couchdb instances 1..n => loadbalancer
CHALLENGES
• Large amounts of data
• Large views (with big/long map/reduce queries)
• LOTS of traffic
• Location based partitions
• For fun and profit
MAP/REDUCE
INPUT
IP Bytes
212.122.174.13 18271
212.122.174.13 191726
212.122.174.13 198
74.119.8.111 91272
74.119.8.111 8371
212.122.174.13 43
Map/Reduce example
MAPPER => REDUCER
IP Bytes
212.122.174.13
18271
212.122.174.13191726
212.122.174.13198
212.122.174.13
43
74.119.8.11191272
74.119.8.1118371
AFTER REDUCE
IP Bytes
212.122.174.13 210238
74.119.8.111 99643
PARTITION INPUT
Partition IP Bytes
0 212.122.174.13 18271
0 212.122.174.13 191726
0 212.122.174.13 198
1 74.119.8.111 91272
1 74.119.8.111 8371
0 212.122.174.13 43
Map/Reduce example
MAPPER => REDUCER
Partition IP Bytes
0 212.122.174.13
18271
0 212.122.174.13191726
0 212.122.174.13198
0 212.122.174.13
43
1 74.119.8.11191272
1 74.119.8.1118371
If data is big enough, you could even need a re-re-re-reducer
AFTER REDUCE
IP Bytes
212.122.174.13 210238
74.119.8.111 99643
• CouchDB Lounge
• Pillow
• BigCouch
CLUSTERING OPTIONS
LOUNGE
•partitioning/clustering
•Nginx module
•meebo.com
• ‘easy’
•http://tilgovi.github.com/couchdb-lounge/
LOUNGE
• dumb_proxy => proxy for simple PUT/GET’s
• smart_proxy => proxy for map/reduce over shards
• replicator => updates all copies, redudantly
it can make sure that there are N copies of a document at every moment
CouchDBrelax
CouchDBrelax
nginx
...n
dumb_proxy
dumb_proxy == ONLY GET/PUT
CouchDBrelax
CouchDBrelax
nginx
...n
smart_proxy
smart_proxy takes care of the map/reduce and re-reducers over multiple nodes
Bonus:
other nginx modules work too
mod_cache, mod_expire, etc.
PILLOW
•Erlang based
• router/rereducer (map/reduce over multiple systems)
• In development (but promising!)
•https://github.com/khellan/Pillow
BIGCOUCH
•Fork
•100% api compatible
•Open Source/Commercial
•https://cloudant.com/#!/solutions/bigcouch
CONTENTS
• Introduction
• PHP Usage
• Replication/Scalability
• Backend usage
• Couchapps
•Other stuff
BACKEND USAGE
PROXIED
CouchDBrelax
proxied via middleware, or via mod_proxy or similiar
DIRECT
CouchDBrelax
or direct, because http based, content is directly available in javascript
NOSQL && SQL HYBRID
• onSave, onCommit hooks available in every major framework
• onSave -> make a JSON representation of your object, and PUT it to couchdb (#protip: only ‘public’ data)
• sql db is leading, you don’t care about versioning in couchdb
• you can use your data directly from couchdb within your frontend javascript
<?phpclass Pony extends Application_models{ public function toArray() { $data = $this->_getData(); unset($data['created_on']); unset($data['created_by']); unset($data['access_level']); unset($data['private_data']); $data['tags'] = $this->getTags(); $data['categories'] = $this->getCategories(); $data['rainbows'] = 'double'; return $data; }}
MODEL
AFTER_SAVE
<?phpclass article_module extends admin_module{ public function after_save() { parent::after_save(); $data = $this->toJson(); $res = CouchDB::put($data); $this->_id = $res->_id; $this->_rev = $res->_rev; }}
RewriteEngine OnRewriteRule /data/(.*) http://127.0.0.1:5984/db/$1 [P,L]
PROXY
Proxy the calls (work around sandbox/other domain error), or use jsonp
JAVASCRIPT
<script type="text/javascript">$.getJSON("/db/ponies/_design/ponies/_view/best-ponies?include_docs=true", function(res){ for(i in res.rows) { doc = res.rows[i].doc; // do stuff } });</script>
CONTENTS
• Introduction
• PHP Usage
• Replication/Scalability
• Backend usage
• Couchapps
•Other stuff
COUCHAPP
CouchDB has it’s own structure for “distributed, scalable web applications” called couchapps
“Distributed, scalable, web applications you say?
omgwtfbbq!?!1!!!11!1!eleven”
_attachments
the magic is in _attachments
CouchDBrelax
CouchDBrelax
CouchDBrelax
distribution via replication
INSTALLATION
Couchapp 0.7.0
installation is easy
$ couchapp init
init a project
LAYOUT
creates a default folder
$ couchapp push http://ponies.couchone.com:5984/rainbows
https://github.com/brandon-beacher/couchapp-tmbundle
couchapp push on save -> textmate
CONTENTS
• Introduction
• PHP Usage
• Replication/Scalability
• Backend usage
• Couchapps
•Other stuff
OTHER STUFF
REWRITES
_REWRITE
$ curl "http://ponies.couchone.com/rainbows/_design/ponies/_view/best-ponies?descending=true&limit=5&key=”foobar”
such urls make us a sad panda
{ .... "rewrites": [ { "from": "/best-5-ponies", "to": "ponies/_view/best-ponies", "method": "GET", "query": { "descending": true, "limit": 5, "key": "foobar" } } ] }
$ curl "http://ponies.couchone.com/rainbows/_design/ponies/_view/best-ponies?descending=true&limit=5&key=”foobar”
rewrite this
$ curl "http://ponies.couchone.com/rainbows/_design/ponies/_rewrite/best-5-ponies"
to this
[vhosts]awesomeponies.com = /rainbows/_design/ponies/_rewrite
$ curl "http://ponies.couchone.com/rainbows/_design/ponies/_rewrite/best-5-ponies"
rewrite this
$ curl "http://awesomeponies.com/best-5-ponies"
to this
_CHANGES
$ curl -X GET "http://ponies.couchone.com/rainbows/_changes"
{"results":[
],"last_seq":0}
curl -X PUT http://ponies.couchone.come/rainbows/foobar -d '{"type":"awesome"}'
{"results":[{"seq":1,"id":"foobar","changes":[{"rev":"1-aaa8e2a031bca334f50b48b6682fb486"}]}],"last_seq":1}
{"results":[{"seq":1,"id":"foobar","changes":[{"rev":"1-aaa8e2a031bca334f50b48b6682fb486"}]},{"seq":2,"id":"foobar2","changes":[{"rev":"1-e18422e6a82d0f2157d74b5dcf457997"}]}],"last_seq":2}
_CHANGES OPTIONS
• ?since
• Longpolling
• Continuous
$ curl -X GET "http://ponies.couchone.com/rainbows/_changes?since=20"
curl -X GET "http://ponies.couchone.com/rainbows/_changes?feed=longpoll&since=2"
Longpolling: good for little updates, connections stays open until change, then gets closed and you need to reconnect, lots of reconnects for lots of updates
curl -X GET "http://ponies.couchone.com/rainbows/_changes?feed=continuous&since=2"
Connections stays open, and you get updates on the fly!
FILTERS
filters can be used to filter documents from output
function(doc, req){ if(doc.priority == 'high') { return true; } return false;}
we only want high priority documents
curl -X GET"http://ponies.couchone.com/rainbows/_changes?feed=continuous&filter=app/important
function(doc, req){ if(doc.name == req.query.name) { return true; }
return false;}
you can use req for request based filters
curl -X GET"http://ponies.couchone.com/rainbows/_changes?feed=continuous&filter=app/name&name=foobar
SHOWS
function(doc, req) { return { body: "Hello World" }}
curl -X"http://ponies.couchone.com/rainbows/_design/foobar/_show/showfunction/docid"
function(doc) { return { "code": 302, "body": "See other", "headers": { "Location": doc.target } };}
You can also define http headers, we used this for translating public id’s into private storage id’s. In this way, couchdb took care of all the headers and http stuff, and we could use a regular nginx proxy module
LUCENE
[external]fti=/path/to/python /path/to/couchdb-lucene/tools/couchdb-external-hook.py
[httpd_db_handlers]_fti = {couch_httpd_external, handle_external_req, <<"fti">>}
function(doc) { var ret=new Document(); ret.add(doc.message); ret.add(new Date(doc.datetime)); return ret;}
curl -X GET"http://ponies.couchone.com/rainbows/_fti/_design/unicorns/by-query?q=foobar"
GEOCOUCHhttps://github.com/vmx/couchdb
See Dericks talk yesterday
GEOCOUCH
• Supports bbox
• fork
• outputs via lists, georss possible
• directly useable by google maps
• can read GIS data
• combined with _changes makes interesting usecase
- bbox => all items withing a certain bounding box, polygon is in the works- currently a fork of couchdb, in the works as external module- output can be setup seperately- google maps can use georss- GIS: Geographic Information System (used worldwide?)
function(doc){ if (doc.geo && doc.geo.latitude != '' && doc.geo.longitude != '') { emit( { type: "Point", coordinates: [parseFloat(doc.geo.latitude), parseFloat(doc.geo.longitude)] }, [doc._id, doc] ); }}
SPATIAL INDEXin spatial/points.js
http://ponies.couchone.com/rainbows/_design/unicorns/_spatial/points?bbox=0,0,180,90
Worldwide search
{"update_seq":3,"rows":[ { "id":"augsburg", "bbox":[10.898333,48.371667,10.898333,48.371667], "value":["augsburg",[10.898333,48.371667]] }]}
if (GBrowserIsCompatible()){ map = new GMap2(document.getElementById('map')); var geoXML = new GGeoXml('http://ponies.couchone.com/rainbows/url-to-georss-view'); map.addOverlay(geoXML);}
GEORSS && GOOGLE MAPS
curl -X GET "http://ponies.couchone.com/rainbows/_design/alarmeringen/_spatial/points?bbox=51.711369,4.218407,52.136520,4.745740";
Q?
http://joind.in/talk/view/2495
second talk ever, please provide feedback