34
THE DATABASE REVOLUTION Robin Bloor, Ph D Tuesday, August 2, 11

The Coming Database Revolution

Embed Size (px)

Citation preview

Page 1: The Coming Database Revolution

THE DATABASE REVOLUTION

Robin Bloor, Ph D

Tuesday, August 2, 11

Page 2: The Coming Database Revolution

Intro: The RDBMSComputer Hardware TrendsThe NoSQL trend (Either No as in none or NO as in Not Only)What to do...

This Presentation

Main Take Away:

Database is no longer a commodity

Tuesday, August 2, 11

Page 3: The Coming Database Revolution

A Point Of DepartureIn the 1990s, Relational Database quickly became the dominant form of database.The SQL language became the dominant data access mechanism.The RDBMS conferred mathematical respectability on itself and even claimed an underlying “Relational Algebra.”The RDBMS dominated because it dealt effectively with transactional and BI apps.

Tuesday, August 2, 11

Page 4: The Coming Database Revolution

Relational DogmaData and Process should be kept separate.The database embodies a data model within a schemaNormalization to 3NF (or 5NF) is the correct way to design the schemaThe query language (SQL) is part DDL and part DML (Select, Project, Join)Ordering doesn’t matter

Tuesday, August 2, 11

Page 5: The Coming Database Revolution

The 1990s RDBMSThe RDBMS of the 1990s was physically based on B-tree structures and an optimizer.This scaled up within reason but it scaled out poorly.It was fundamentally an index-based data store.It managed megabytes and gigabytes fine.But look what happened to data....

Tuesday, August 2, 11

Page 6: The Coming Database Revolution

Moore’s Law CubedMoore’s Law suggests that CPU power increases 10-fold every 6 years (and other technologies have stayed in step to some degree)Large database volumes have grown 1000-fold:

In ~1992 measured in megabytesIn ~1998 measured in gigabytesIn ~2004 measured in terabytesin ~2010 measured in petabytes

Exabytes by ~2016?

Tuesday, August 2, 11

Page 7: The Coming Database Revolution

HARDWARE

Tuesday, August 2, 11

Page 8: The Coming Database Revolution

RDBMS

Tuesday, August 2, 11

Page 9: The Coming Database Revolution

Tuesday, August 2, 11

Page 10: The Coming Database Revolution

RDBMS

Tuesday, August 2, 11

Page 11: The Coming Database Revolution

A Database is a Cupboard

Some are transactional (for operational systems)

Some service large queries against large data heaps

Some are content oriented for accessing complex objects (object based systems mainly)

All databases need to deliver performance

Tuesday, August 2, 11

Page 12: The Coming Database Revolution

A Database is a Cupboard

Some are transactional (for operational systems)

Some service large queries against large data heaps

Some are content oriented for accessing complex objects (object based systems mainly)

All databases need to deliver performance

RDBMS ✔

Tuesday, August 2, 11

Page 13: The Coming Database Revolution

A Database is a Cupboard

Some are transactional (for operational systems)

Some service large queries against large data heaps

Some are content oriented for accessing complex objects (object based systems mainly)

All databases need to deliver performance

RDBMS ✔

RDBMS ??

Tuesday, August 2, 11

Page 14: The Coming Database Revolution

A Database is a Cupboard

Some are transactional (for operational systems)

Some service large queries against large data heaps

Some are content oriented for accessing complex objects (object based systems mainly)

All databases need to deliver performance

RDBMS ✔

RDBMS ??RDBMS ??

Tuesday, August 2, 11

Page 15: The Coming Database Revolution

Hardware Data PointsMoore’s Law now proceeds by adding cores rather than by increasing clock speed. Vector registers now standard on Intel chipsParallelism is now on the rise and will eventually become the normal mode of processingMemory is about 1 million times faster than disk and random reads have become very expensive in respect of latencyThe Intel processor is now being challenged by the ARM processor (it’s about heat)

Tuesday, August 2, 11

Page 16: The Coming Database Revolution

Memory v Disk

Tuesday, August 2, 11

Page 17: The Coming Database Revolution

Memory v DiskThe decline in memory costs is (on current trends) likely to have memory cheaper than disk around 2016This means that non-volatile SSDs will prevail relatively soon.SSDs are between 1000 and 100,000 times faster than spinning disk

Tuesday, August 2, 11

Page 18: The Coming Database Revolution

Massive Scale-OutCPUS are now doubling cores every 18 months or so. This trend, combined with memory cost trends, suggests that massive scale out will eventually become a much rarer requirement.But we cannot know that for sure.

Tuesday, August 2, 11

Page 19: The Coming Database Revolution

ConsequencesSSD will replace disk - but slowly...Many DBMS tasks can now be handled in memory - but better physical architectures are possible for this. Physical indexes are becoming irrelevantScale out and parallelism are now the driving force for large data volume applications.The physical architecture of the traditional RDBMS is now an anachronism

Tuesday, August 2, 11

Page 20: The Coming Database Revolution

NoSQL

Tuesday, August 2, 11

Page 21: The Coming Database Revolution

A Plethora of Databases4th Dimension, Adabas D, AllegroGraph, Alpha Five, Altibase, Apache Derby, Aster Data, Azure Table Storage, BaseX, Berkeley DB, Bigdata, BlackRay, CA-Datacom, Cassandra, Chordless, Citrusleaf , Clarion, Cloudata, Cloudera, Clustrix, CouchDB,

CSQL, CUBRID, Daffodil database, Data Management Center (DMC), Database Management Library, DataEase, Dataphor, DB-Fast, db4o, Derby aka Java DB, DEX, Dynomite, EffiProz, ElevateDB, Empress Embedded Database, EnterpriseDB, eXist,

eXtremeDB, Faircom C-Tree, fastDB, FileDB, FileMaker Pro, Firebird, FlockDB, FrontBase, GenieDB, GigaSpaces, Gladius DB, Greenplum, GroveSite, GT.M, H2,

Hadoop / HBase, HamsterDB, Hazelcast, Helix database, Hibari, HPCC, HSQLDB, HyperGraphDB, Hypertable, IBM DB2, IBM DB2 Express-C, IBM Lotus Approach, IBM

Lotus/Domino, Infinite Graph, Infobright, InfoGrid, Informix, Ingres, InterBase, Intersystems Cache, InterSystems Caché, ISIS Family, KAI, Kognitio, LightCloud, Linter,

Magma, MariaDB, Mark Logic Server, MaxDB, Mckoi SQL Database, MEMBASE, MemcacheDB, Microsoft Access, Microsoft Jet Database Engine (part of Microsoft

Access), Microsoft SQL Server, Microsoft SQL Server Express, Microsoft Visual FoxPro, Mimer SQL, Mnesia , MonetDB, MongoDB, Morantex, mSQL, MySQL, Neo4J, NEO,

Netezza, NonStop SQL, Objectivity, Openbase, OpenInsight, OpenLink Virtuoso, OpenLink Virtuoso, OpenLink Virtuoso Universal Server, OpenQM, Oracle, Oracle Rdb

for OpenVMS, OrientDB, Panorama, Perst, PervasiveSQL, PicoLisp, Pincaster, PostgreSQL, Prevayler, Progress Software, Qizx, Queplix, RaptorDB, RavenDB, RDM Embedded, RDM Server, Recutils, Redis, Riak, SAND CDBMS, Sav Zigzag, Scalaris, Scalien, SciDB, ScimoreDB, Sedna, SisoDB, SmallSQL, solidDB, Sones, SQLBase,

SQLDB, SQLite, Starcounter, Sterling, Stratosphere, STSdb, Sybase, Sybase IQ, tdbengine, Teradata, Terrastore, The SAS system, ThruDB, TimesTen, Tokutek , Trinity, txtSQL, U2, UniData, UniVerse, Valentina, Versant, VertexDB , Vertica, VistaDB, VMDS,

Voldemort, WCE SL Plus, XSPRADA, Yserial, ZODB, Zoduna

Hyper-mediaDBMS

CloudDBMS

TemporalDBMS

Hadoop& HBASE

(MPP)

OLAPDBMS ODBMS OR

DBMSNetworkDBMS

TextDBMS

OpenSourceDBMS

TripleStores

GraphDBMS

XMLDBMS

ContentDBMS

AlgebraicDBMS

In MemoryDBMS

RDBMS

ColumnStore DBMS

StreamsDBMS

AnalyticDBMS

Tuesday, August 2, 11

Page 22: The Coming Database Revolution

RDBMS & SQL As AnachronismsFor big BI, RDBMS has been superseded by column store dbms primarily because it didn’t scale out and indexes have become far less important.The use of snowflake schemas and star schemas had already demonstrated that 3NF was a limited modeling technique and nothing more.And then came Hadoop & MapReduce for massive scale-out - which cares nothing for SQL or RDBMS

Tuesday, August 2, 11

Page 23: The Coming Database Revolution

A Fundamental ErrorActions: Add, Modify, Delete, ArchiveFrom day 1 there was a fundamental error in the simple mechanics of database and file systems.When you update data you destroy the old value. No audit trail.A correct theory of data was invented by (perhaps) Luca Pacioli. It is the basis of accounting. A few databases (Firebird is one) were built so that data was only ever added or archived.

Tuesday, August 2, 11

Page 24: The Coming Database Revolution

The Ordering Of Data“A data set is an unordered collection of unique, non-duplicated items.”This is an absurd constraint to place upon data, as data is naturally ordered by time if by nothing else.

Events are ordered by time.Changes to entities are ordered by time

There are lots of applications. requiring time series capability.This has led to TSDB products like Streambase, Vhayu, Open TSDB, etc.

Tuesday, August 2, 11

Page 25: The Coming Database Revolution

The Separation of Data and ProcessThe assumption was that this separation could be enforcedBut when you try to enforce it, you forever encounter data and process locked together in a guilty embrace.It is a wrong separation of concerns.In truth it cannot be enforced without there being a true algebra of dataSo many databases (object databases and other NoSQL databases) do not enforce it.However their interfaces to data are not perfect either.

DBMS

Process

SQL SCHEMA

Tuesday, August 2, 11

Page 26: The Coming Database Revolution

Relational Algebra Isn’t An AlgebraSet aside that fact that RDBMS focus so strongly on Table structures that they cannot naturally represent other important data structures (such as BOMP and MOLAP).And that RDBMS rail against the ordering of data (“No order”)Ignore the stored procedures (which violate the separation of data and process).Even so Relational Algebra is not even an algebra. (NULLs?)There is at least one algebraic (NoSQL) database

Tuesday, August 2, 11

Page 27: The Coming Database Revolution

The SQL BarrierSQL has:

DDL (for data definition)DML (for Select, Project and Join)But it has no MML or TML

Usually result sets are brought to the client for further manipulation, but using them for further data access becomes problematic.Conclusions:

This separation of data from process is arbitrary and unhelpfulAny database to which this doesn’t apply is NoSQL

AnalyticDBMS

SQLBarrier

SQL

Resultsprocessing

must be done here

Or resultsprocessing

must be done here

Tuesday, August 2, 11

Page 28: The Coming Database Revolution

Other NDBMS DirectionsSome NDBMS do not attempt to provide all ACID properties. (Atomicity, Consistency, Isolation, Durability)Some NDBMS deploy a distributed scale-out architecture with data redundancy.XML DBMS using XQuery are NDBMS.Some documents stores are NDBMS (OrientDB, Terrastore, etc.)Object databases are NDBMS (Gemstone, Objectivity, ObjectStore, etc.)Key value stores = schema-less stores (Cassandra, MongoDB, Berkeley DB, etc.) Graph DBMS (DEX, OrientDB, etc.) are NDMBSLarge data pools (BigTable, Hbase, Mnesia, etc.) are NDBMS

Tuesday, August 2, 11

Page 29: The Coming Database Revolution

What To Do...

Tuesday, August 2, 11

Page 30: The Coming Database Revolution

What Is The Problem You Are Trying To Solve?

The primary message of this presentation is that database is no longer a commodity (if it ever was).Despite faults and weaknesses the General Purpose Relations Database works fine for many areas of application and:

It is well understoodSkills (for any popular product) are abundantIt can be inexpensive (by license or Open Source)

Beyond such products, it is “horses for courses” and “caveat emptor.”

Tuesday, August 2, 11

Page 31: The Coming Database Revolution

Other Selection CriteriaDon’t fall for fashion.Proven performance?Skills, both for design and for administration.Interfaces & middlewareThe hardware bill.Product roadmap.External support/internal support.Calculate a TCO (note that even for expensive DBMS the licenses fees are rarely more than 15% of the TCO)

Tuesday, August 2, 11

Page 32: The Coming Database Revolution

Hardware trends have brought change, will bring more changeThere are many RDBMS weaknesses There are a huge number of “new” database products bothNo SQL Whatsoever, andNot Only SQL

Select database products with caution

Take Aways

Main Take Away:

Database is no longer a commodity

Tuesday, August 2, 11

Page 33: The Coming Database Revolution

Tuesday, August 2, 11

Page 34: The Coming Database Revolution

Thank YouFor YourAttention

Tuesday, August 2, 11