View
959
Download
2
Category
Preview:
Citation preview
THE DATABASE REVOLUTION
Robin Bloor, Ph D
Tuesday, August 2, 11
Intro: The RDBMSComputer Hardware TrendsThe NoSQL trend (Either No as in none or NO as in Not Only)What to do...
This Presentation
Main Take Away:
Database is no longer a commodity
Tuesday, August 2, 11
A Point Of DepartureIn the 1990s, Relational Database quickly became the dominant form of database.The SQL language became the dominant data access mechanism.The RDBMS conferred mathematical respectability on itself and even claimed an underlying “Relational Algebra.”The RDBMS dominated because it dealt effectively with transactional and BI apps.
Tuesday, August 2, 11
Relational DogmaData and Process should be kept separate.The database embodies a data model within a schemaNormalization to 3NF (or 5NF) is the correct way to design the schemaThe query language (SQL) is part DDL and part DML (Select, Project, Join)Ordering doesn’t matter
Tuesday, August 2, 11
The 1990s RDBMSThe RDBMS of the 1990s was physically based on B-tree structures and an optimizer.This scaled up within reason but it scaled out poorly.It was fundamentally an index-based data store.It managed megabytes and gigabytes fine.But look what happened to data....
Tuesday, August 2, 11
Moore’s Law CubedMoore’s Law suggests that CPU power increases 10-fold every 6 years (and other technologies have stayed in step to some degree)Large database volumes have grown 1000-fold:
In ~1992 measured in megabytesIn ~1998 measured in gigabytesIn ~2004 measured in terabytesin ~2010 measured in petabytes
Exabytes by ~2016?
Tuesday, August 2, 11
HARDWARE
Tuesday, August 2, 11
RDBMS
Tuesday, August 2, 11
Tuesday, August 2, 11
RDBMS
Tuesday, August 2, 11
A Database is a Cupboard
Some are transactional (for operational systems)
Some service large queries against large data heaps
Some are content oriented for accessing complex objects (object based systems mainly)
All databases need to deliver performance
Tuesday, August 2, 11
A Database is a Cupboard
Some are transactional (for operational systems)
Some service large queries against large data heaps
Some are content oriented for accessing complex objects (object based systems mainly)
All databases need to deliver performance
RDBMS ✔
Tuesday, August 2, 11
A Database is a Cupboard
Some are transactional (for operational systems)
Some service large queries against large data heaps
Some are content oriented for accessing complex objects (object based systems mainly)
All databases need to deliver performance
RDBMS ✔
RDBMS ??
Tuesday, August 2, 11
A Database is a Cupboard
Some are transactional (for operational systems)
Some service large queries against large data heaps
Some are content oriented for accessing complex objects (object based systems mainly)
All databases need to deliver performance
RDBMS ✔
RDBMS ??RDBMS ??
Tuesday, August 2, 11
Hardware Data PointsMoore’s Law now proceeds by adding cores rather than by increasing clock speed. Vector registers now standard on Intel chipsParallelism is now on the rise and will eventually become the normal mode of processingMemory is about 1 million times faster than disk and random reads have become very expensive in respect of latencyThe Intel processor is now being challenged by the ARM processor (it’s about heat)
Tuesday, August 2, 11
Memory v Disk
Tuesday, August 2, 11
Memory v DiskThe decline in memory costs is (on current trends) likely to have memory cheaper than disk around 2016This means that non-volatile SSDs will prevail relatively soon.SSDs are between 1000 and 100,000 times faster than spinning disk
Tuesday, August 2, 11
Massive Scale-OutCPUS are now doubling cores every 18 months or so. This trend, combined with memory cost trends, suggests that massive scale out will eventually become a much rarer requirement.But we cannot know that for sure.
Tuesday, August 2, 11
ConsequencesSSD will replace disk - but slowly...Many DBMS tasks can now be handled in memory - but better physical architectures are possible for this. Physical indexes are becoming irrelevantScale out and parallelism are now the driving force for large data volume applications.The physical architecture of the traditional RDBMS is now an anachronism
Tuesday, August 2, 11
NoSQL
Tuesday, August 2, 11
A Plethora of Databases4th Dimension, Adabas D, AllegroGraph, Alpha Five, Altibase, Apache Derby, Aster Data, Azure Table Storage, BaseX, Berkeley DB, Bigdata, BlackRay, CA-Datacom, Cassandra, Chordless, Citrusleaf , Clarion, Cloudata, Cloudera, Clustrix, CouchDB,
CSQL, CUBRID, Daffodil database, Data Management Center (DMC), Database Management Library, DataEase, Dataphor, DB-Fast, db4o, Derby aka Java DB, DEX, Dynomite, EffiProz, ElevateDB, Empress Embedded Database, EnterpriseDB, eXist,
eXtremeDB, Faircom C-Tree, fastDB, FileDB, FileMaker Pro, Firebird, FlockDB, FrontBase, GenieDB, GigaSpaces, Gladius DB, Greenplum, GroveSite, GT.M, H2,
Hadoop / HBase, HamsterDB, Hazelcast, Helix database, Hibari, HPCC, HSQLDB, HyperGraphDB, Hypertable, IBM DB2, IBM DB2 Express-C, IBM Lotus Approach, IBM
Lotus/Domino, Infinite Graph, Infobright, InfoGrid, Informix, Ingres, InterBase, Intersystems Cache, InterSystems Caché, ISIS Family, KAI, Kognitio, LightCloud, Linter,
Magma, MariaDB, Mark Logic Server, MaxDB, Mckoi SQL Database, MEMBASE, MemcacheDB, Microsoft Access, Microsoft Jet Database Engine (part of Microsoft
Access), Microsoft SQL Server, Microsoft SQL Server Express, Microsoft Visual FoxPro, Mimer SQL, Mnesia , MonetDB, MongoDB, Morantex, mSQL, MySQL, Neo4J, NEO,
Netezza, NonStop SQL, Objectivity, Openbase, OpenInsight, OpenLink Virtuoso, OpenLink Virtuoso, OpenLink Virtuoso Universal Server, OpenQM, Oracle, Oracle Rdb
for OpenVMS, OrientDB, Panorama, Perst, PervasiveSQL, PicoLisp, Pincaster, PostgreSQL, Prevayler, Progress Software, Qizx, Queplix, RaptorDB, RavenDB, RDM Embedded, RDM Server, Recutils, Redis, Riak, SAND CDBMS, Sav Zigzag, Scalaris, Scalien, SciDB, ScimoreDB, Sedna, SisoDB, SmallSQL, solidDB, Sones, SQLBase,
SQLDB, SQLite, Starcounter, Sterling, Stratosphere, STSdb, Sybase, Sybase IQ, tdbengine, Teradata, Terrastore, The SAS system, ThruDB, TimesTen, Tokutek , Trinity, txtSQL, U2, UniData, UniVerse, Valentina, Versant, VertexDB , Vertica, VistaDB, VMDS,
Voldemort, WCE SL Plus, XSPRADA, Yserial, ZODB, Zoduna
Hyper-mediaDBMS
CloudDBMS
TemporalDBMS
Hadoop& HBASE
(MPP)
OLAPDBMS ODBMS OR
DBMSNetworkDBMS
TextDBMS
OpenSourceDBMS
TripleStores
GraphDBMS
XMLDBMS
ContentDBMS
AlgebraicDBMS
In MemoryDBMS
RDBMS
ColumnStore DBMS
StreamsDBMS
AnalyticDBMS
Tuesday, August 2, 11
RDBMS & SQL As AnachronismsFor big BI, RDBMS has been superseded by column store dbms primarily because it didn’t scale out and indexes have become far less important.The use of snowflake schemas and star schemas had already demonstrated that 3NF was a limited modeling technique and nothing more.And then came Hadoop & MapReduce for massive scale-out - which cares nothing for SQL or RDBMS
Tuesday, August 2, 11
A Fundamental ErrorActions: Add, Modify, Delete, ArchiveFrom day 1 there was a fundamental error in the simple mechanics of database and file systems.When you update data you destroy the old value. No audit trail.A correct theory of data was invented by (perhaps) Luca Pacioli. It is the basis of accounting. A few databases (Firebird is one) were built so that data was only ever added or archived.
Tuesday, August 2, 11
The Ordering Of Data“A data set is an unordered collection of unique, non-duplicated items.”This is an absurd constraint to place upon data, as data is naturally ordered by time if by nothing else.
Events are ordered by time.Changes to entities are ordered by time
There are lots of applications. requiring time series capability.This has led to TSDB products like Streambase, Vhayu, Open TSDB, etc.
Tuesday, August 2, 11
The Separation of Data and ProcessThe assumption was that this separation could be enforcedBut when you try to enforce it, you forever encounter data and process locked together in a guilty embrace.It is a wrong separation of concerns.In truth it cannot be enforced without there being a true algebra of dataSo many databases (object databases and other NoSQL databases) do not enforce it.However their interfaces to data are not perfect either.
DBMS
Process
SQL SCHEMA
Tuesday, August 2, 11
Relational Algebra Isn’t An AlgebraSet aside that fact that RDBMS focus so strongly on Table structures that they cannot naturally represent other important data structures (such as BOMP and MOLAP).And that RDBMS rail against the ordering of data (“No order”)Ignore the stored procedures (which violate the separation of data and process).Even so Relational Algebra is not even an algebra. (NULLs?)There is at least one algebraic (NoSQL) database
Tuesday, August 2, 11
The SQL BarrierSQL has:
DDL (for data definition)DML (for Select, Project and Join)But it has no MML or TML
Usually result sets are brought to the client for further manipulation, but using them for further data access becomes problematic.Conclusions:
This separation of data from process is arbitrary and unhelpfulAny database to which this doesn’t apply is NoSQL
AnalyticDBMS
SQLBarrier
SQL
Resultsprocessing
must be done here
Or resultsprocessing
must be done here
Tuesday, August 2, 11
Other NDBMS DirectionsSome NDBMS do not attempt to provide all ACID properties. (Atomicity, Consistency, Isolation, Durability)Some NDBMS deploy a distributed scale-out architecture with data redundancy.XML DBMS using XQuery are NDBMS.Some documents stores are NDBMS (OrientDB, Terrastore, etc.)Object databases are NDBMS (Gemstone, Objectivity, ObjectStore, etc.)Key value stores = schema-less stores (Cassandra, MongoDB, Berkeley DB, etc.) Graph DBMS (DEX, OrientDB, etc.) are NDMBSLarge data pools (BigTable, Hbase, Mnesia, etc.) are NDBMS
Tuesday, August 2, 11
What To Do...
Tuesday, August 2, 11
What Is The Problem You Are Trying To Solve?
The primary message of this presentation is that database is no longer a commodity (if it ever was).Despite faults and weaknesses the General Purpose Relations Database works fine for many areas of application and:
It is well understoodSkills (for any popular product) are abundantIt can be inexpensive (by license or Open Source)
Beyond such products, it is “horses for courses” and “caveat emptor.”
Tuesday, August 2, 11
Other Selection CriteriaDon’t fall for fashion.Proven performance?Skills, both for design and for administration.Interfaces & middlewareThe hardware bill.Product roadmap.External support/internal support.Calculate a TCO (note that even for expensive DBMS the licenses fees are rarely more than 15% of the TCO)
Tuesday, August 2, 11
Hardware trends have brought change, will bring more changeThere are many RDBMS weaknesses There are a huge number of “new” database products bothNo SQL Whatsoever, andNot Only SQL
Select database products with caution
Take Aways
Main Take Away:
Database is no longer a commodity
Tuesday, August 2, 11
Tuesday, August 2, 11
Thank YouFor YourAttention
Tuesday, August 2, 11
Recommended