Data Warehouses and NoSQL -...

Preview:

Citation preview

DataWarehousesandNoSQLSharingAdministra6veInforma6on

CarmenBarandelaSo-wareEngineerCERN/GS‐AIS

October24‐28,2011JINR/CERNGridandManagementInforma6onSystems

Agenda•  DataWarehousesinAdministra6veCompu6ng

•  Recap:DataWarehousesTheory

•  DataWarehousesandInforma6onSystemsinAIS–  Founda6on,HRandFIInforma6onSystems

–  ComplexDataExtrac6onProcesses–  Pixel‐PerfectRepor6ng– Dashboards

•  NoSQL–  ...

CarmenBarandela:"DataWarehousesandNoSQL” 2

Agenda•  DataWarehousesinAdministra6veCompu6ng

•  Recap:DataWarehousesTheory

•  DataWarehousesandInforma6onSystemsinAIS–  Founda6on,HRandFIInforma6onSystems

–  ComplexDataExtrac6onProcesses–  Pixel‐PerfectRepor6ng– Dashboards

•  NoSQL–  ...

3CarmenBarandela:"DataWarehousesandNoSQL”

Ca.16,000People

4CarmenBarandela:"DataWarehousesandNoSQL”

Mankind’sLargestMachine

5CarmenBarandela:"DataWarehousesandNoSQL”

EnormousAmountofData

JanJanke:"DataWarehousesandAnaly6calDataProcessing..." 6

Administra6veCompu6ng•  ProvidesmeanstoadministrateCERN•  Enablesphysiciststofocusontheirwork•  Allowsmanagementtomaketherightmoves

7CarmenBarandela:"DataWarehousesandNoSQL”

WhyDataWarehouses?•  Heterogeneouscompu6nglandscape•  VariousspecialisedOLTPsystems•  Planningneeds•  LegalRequirements

8

  Supportadministra6vestaff  Enforcesecurityandsafetyonsite  Allowmanagementtomakedecisions

CarmenBarandela:"DataWarehousesandNoSQL”

Example:KeepFinancesUnderControl

•  SpecialisedSystems– Accoun6ng,ERPforCERNstores– Externalcontractsmanagement– Payroll,treasurymanagement,…

9

Specialisedsmallusergroups

Dis6nctdatabases

Highavailabilityandperformance,real‐6medata

Systemsonlyaccessibletoauthorisedspecialists

CarmenBarandela:"DataWarehousesandNoSQL”

Example:KeepFinancesUnderControl

•  GeneralFinancialInforma6onSystem– Singlesystem

– Accesstodatafrommul6plesources– Differentlevelsofcomplexity

10

Specialisedsmallusergroups

Dis6nctdatabases

Highavailabilityandperformance,real‐6medata

Systemsonlyaccessibletoauthorisedspecialists

CarmenBarandela:"DataWarehousesandNoSQL”

Example:KeepFinancesUnderControl

•  GeneralFinancialInforma6onSystem– Singlesystem

– Accesstodatafrommul6plesources– Differentlevelsofcomplexity

11

UsersfromallareasofCERN

Singledatawarehouse

Highavailabilityandperformance,butnonecessityforreal‐6medata

Securityisextremelyimportant!SystemisaccessibleCERNwide.

CarmenBarandela:"DataWarehousesandNoSQL”

AIS’FinancialDataWarehouse

•  Keepdatainsyncwithdataproviders•  Mastercomplexdataextrac6onprocess

•  Ensurehighqueryperformance

•  Basefordetaileddataanalysis

12

Technologies:o ORACLERACdatabaseo JavaEnterprisewebapplica6onso In‐housedevelopedframeworkso Third‐partyBIandrepor6ngtools

CarmenBarandela:"DataWarehousesandNoSQL”

Agenda•  DataWarehousesinAdministra6veCompu6ng

•  Recap:DataWarehousesTheory

•  DataWarehousesandInforma6onSystemsinAIS–  Founda6on,HRandFIInforma6onSystems

–  ComplexDataExtrac6onProcesses–  Pixel‐PerfectRepor6ng– Dashboards

•  DetailedDataWarehouseExample– ManagementDataLayer(MDL)

13CarmenBarandela:"DataWarehousesandNoSQL”

FindtheNeedleintheHay…

14CarmenBarandela:"DataWarehousesandNoSQL”

OLTPvsOLAP

OLTP OLAP

Datasource Opera6ons OLTP(consolidated)

Datapurpose Runthebusiness Repor6ng,analysis

Inserts,updates High Periodicbatchjobs

Querycomplexity Low High

DBdesign Normalized Star,snowflake

Availability Cri6cal Lesscri6cal

Target Opera6onalstaff Middle/higherMgmt.

15CarmenBarandela:"DataWarehousesandNoSQL”

OLTPvsOLAP

OLTP OLAP

Datasource Opera6ons OLTP(consolidated)

Datapurpose Runthebusiness Repor6ng,analysis

Inserts,updates High Periodicbatchjobs

Querycomplexity Low Depends…

DBdesign Normalized Snowflakeandothers

Availability Cri6cal MaybeverycriOcal

Target Opera6onalstaff Mgmt.+OperaOons

16

That’stheory!

Realworldisnotthat

easy…

CarmenBarandela:"DataWarehousesandNoSQL”

Normalisa6on(Codd/Boyce)

•  1NF– 1table=1rela6on,norepea6nggroupsorduplicaterows

•  2NF– Allnonprimeanributesdependonallparts(anributes)ofacompositekey

•  3NF– Allnonprimeanributesdependonlyonthe(whole)key

17CarmenBarandela:"DataWarehousesandNoSQL”

Normalisa6on(Codd/Boyce)

•  1NF– 1table=1rela6on,norepea6nggroupsorduplicaterows

•  2NF– Allnonprimeanributesdependonallparts(anributes)ofacompositekey

•  3NF– Allnonprimeanributesdependonlyonthe(whole)key)

18

Course Category Winner Origin

Monaco‘10 Formula1 M.Webber Australia

Japan‘10 Formula1 S.Venel Germany

Japan‘10 Rally S.Ogier France

Notin3NF,why?

CarmenBarandela:"DataWarehousesandNoSQL”

StarSchema

19

Source:hnp://www.execu6onmih.com/data‐warehouse/star‐snowflake‐schema.php(16/10/2010)

item_key

branch_key

loca6on_key

units_sold

dollars_sold

avg_sales

Measures

branch_keybranch_namebranch_type

Branch

SalesFactTable item_keyitem_namebrandtypesupplier_type

item

loca6onloca6on_keyStreetcitystate_or_provincecountry

CarmenBarandela:"DataWarehousesandNoSQL”

SnowflakeSchema

20

Source:hnp://www.execu6onmih.com/data‐warehouse/star‐snowflake‐schema.php(16/10/2010)

city_keycitystate_or_provincecountry

city

item_key

branch_key

loca6on_key

units_sold

dollars_sold

avg_sales

Measures

branch_keybranch_namebranch_type

Branch

SalesFactTableitem_keyitem_namebrandtypesupplier_key

item

loca6on

loca6on_keystreetcity_key

supplier_keySupplier_type

supplier

CarmenBarandela:"DataWarehousesandNoSQL”

FromOpera6onstoRepor6ng

21

Source:hnp://www.deakin.edu.au/ddw/what‐is.php(16/10/2010)

ERP

FI

HR

CarmenBarandela:"DataWarehousesandNoSQL”

Analysis

•  DataMining•  Drilldown– Finerdetailgranularity(e.g.addagroup‐bycolumn)

•  Slice&dice– Playwiththedimensions•  Combinedifferentdimensions•  Remove/addadimension

•  Analysefactchanges

22CarmenBarandela:"DataWarehousesandNoSQL”

Agenda•  DataWarehousesinAdministra6veCompu6ng

•  Recap:DataWarehousesTheory

•  DataWarehousesandInforma6onSystemsinAIS–  Founda6on,HRandFIInforma6onSystems

–  ComplexDataExtrac6onProcesses–  Pixel‐PerfectRepor6ng– Dashboards

•  NoSQL–  ...

23CarmenBarandela:"DataWarehousesandNoSQL”

CERN/AISBusinessMap

24CarmenBarandela:"DataWarehousesandNoSQL”

Founda6on•  CommondatalayerforvariousAISservices•  DatainterfacesforotherCERNservices•  Commonapplica6ons(e.g.mgmt.ofroles)

25

HRInformaOonSystem(HRT)

FIInformaOonSystem(CET)

…moredomainspecificinformaOonsystems

OperaOvesystems

CarmenBarandela:"DataWarehousesandNoSQL”

VariousSpecialisedSystems

•  ORACLEHR•  CERNTrainingApplica6on•  Safety&accesssystems•  EDH(ElectronicDocumentHandling)•  Accoun6ngApplica6on•  ERPsystemforCERNstores•  Contractfollow‐up•  …

26CarmenBarandela:"DataWarehousesandNoSQL”

TechnicalEnvironment

•  Sourcedatabases:– ORACLE10/11g– MicrosorExcel

•  HR/FIInforma6onSystems:– ORACLE10/11g–  JavaEnterprisewebapplica6ons– SAPBusinessObjectstoolfamily

27CarmenBarandela:"DataWarehousesandNoSQL”

DataExtrac6ons

•  Nightlyscheduledbatchjobs•  Extrac6onsorganisedinSQLscripts•  Runbyself‐developed“batchrunner”– Controls•  Orderofexecu6on(sequen6al,parallel)•  Cri6cality•  Logging•  Problemescala6on(automa6cemails)

28CarmenBarandela:"DataWarehousesandNoSQL”

Defini6onofExtrac6onProcess(1)

29

Generaldefini6ons

CarmenBarandela:"DataWarehousesandNoSQL”

Defini6onofExtrac6onProcess(2)

30

Batches&commands

CarmenBarandela:"DataWarehousesandNoSQL”

ImportanceofMonitoring

31

NewhardwareforDEVdatabases(gain>1h)

CarmenBarandela:"DataWarehousesandNoSQL”

TurtleorLeopard?

32CarmenBarandela:"DataWarehousesandNoSQL”

ORACLEMaterialisedViews

•  Pre‐aggregatedsummaries•  Benefitfromqueryrewrite

33

Source:ORACLE10gDocumenta6on/DataWarehousingGuide

CarmenBarandela:"DataWarehousesandNoSQL”

ORACLEMaterialisedViews

•  Pre‐aggregatedsummaries•  Benefitfromqueryrewrite

34

Source:ORACLE10gDocumenta6on/DataWarehousingGuide

CarmenBarandela:"DataWarehousesandNoSQL”

Materialised(Summary)Views

•  Don’tuseremotetablesifyouneedqueryrewrite

•  Creatematerializedviewlogonallsourcetables

35CarmenBarandela:"DataWarehousesandNoSQL”

Materialised(Summary)Views

•  Don’tuseremotetablesifyouneedqueryrewrite

•  Creatematerializedviewlogonallsourcetables

36CarmenBarandela:"DataWarehousesandNoSQL”

Snapshots

•  Usesnapshotstoefficientlyaccessremotetables– Syntax:CREATESNAPSHOT…AS[YourQuery]– Refreshop6ons:•  FAST•  COMPLETE

•  FORCE

37CarmenBarandela:"DataWarehousesandNoSQL”

Snapshots

•  Usesnapshotstoefficientlyaccessremotetables– Syntax:CREATESNAPSHOT…AS[YourQuery]– Refreshop6ons:•  FAST•  COMPLETE

•  FORCE

38CarmenBarandela:"DataWarehousesandNoSQL”

PipelinedFunc6ons

•  PL/SQLisdatasourceinsteadofatable•  MayincreaseperformanceinenvironmentswithheavyPL/SQLuse

39CarmenBarandela:"DataWarehousesandNoSQL”

PipelinedFunc6ons

•  PL/SQLisdatasourceinsteadofatable•  MayincreaseperformanceinenvironmentswithheavyPL/SQLuse

40

CREATE OR REPLACE TYPE myTableFormat AS OBJECT( col_a NUMBER, col_b DATE, col_c VARCHAR2(25) ) /

CREATE OR REPLACE TYPE myTableType AS TABLE OF myTableFormat

/ CarmenBarandela:"DataWarehousesandNoSQL”

1

PipelinedFunc6ons

41

CREATE OR REPLACE FUNCTION myFunc RETURN myTableType PIPELINED IS

BEGIN

FOR i in 1 .. 5 LOOP PIPE ROW ( myTableFormat( i, SYSDATE+i, 'Row '||i ) ); END LOOP; RETURN;

END;

END;

/

2

CarmenBarandela:"DataWarehousesandNoSQL”

PipelinedFunc6ons

42

SELECT * FROM TABLE( myFunc() );

col_a col_b col_c --------- ---------- ---------- 1 27/10/2010 Row 1 2 28/10/2010 Row 2 3 29/10/2010 Row 3 4 30/10/2010 Row 4 5 31/10/2010 Row 5

3

Useapipelinedfunc.onifyourequireadatasourceotherthanatable!

CarmenBarandela:"DataWarehousesandNoSQL”

DatabaseDesign

•  Starschemalike•  Highlyde‐normalisedincl.duplica6onofdata•  Usesingle‐anributekeyswhereverpossible•  Performancemaners!–  Becarefulwhenextrac6ngoverdatabaselinks–  Certaintablesfromopera6onalsystemsarecopied– Dele6on&recrea6onofindexes– Usepar66ons– Manualcontrolofsta6s6cscollec6on– Op6mizingexecu6onplansvery6me‐consuming

43CarmenBarandela:"DataWarehousesandNoSQL”

Repor6ngApplica6onFramework

•  Columnandorderingselec6on•  Subreports•  Variousoutputformats(e.g.HTML,PDF)

•  Charts•  Self‐servicerepor6ng•  Automatedscheduledreportexecu6on

•  Rowandcolumnbasedaccesscontrol

44CarmenBarandela:"DataWarehousesandNoSQL”

DataAccess

45

Name Unit Tel Salary Category

Meyer A 12345 $4,900 3

Schmidt B 23456 $6,400 1

Cook B 34567 $5,700 2

Whichrowsarevisibletome?UnitleaderofBonlyseespersonsfromUnitB.

Whichdata(columns)amIallowedtosee?AsasupervisorImaynotbeen6tledtoseethehealthinsurancecategory.Asafetyormedicalofficermaynotseethesalary,etc.

CarmenBarandela:"DataWarehousesandNoSQL”

UserInterface

46CarmenBarandela:"DataWarehousesandNoSQL”

PixelPerfectForms•  UseofApacheFOPlibrary–  Examples:

•  Employment&traininganesta6ons•  Swiss/Frenchcardapplica6onforms

•  BusinessObjectsXIEnterprise– Directuse–  IndirectuseviaBusinessObjectsJavaSDK–  Examples:

•  Salaryslips•  Cars6ckers•  Workorders

47CarmenBarandela:"DataWarehousesandNoSQL”

BusinessObjects

•  CommercialtoolfamilyfromSAP•  Advantages–  Richrepor6ngpossibili6es(interac6veorviaSDK)– AppealingdashboardsusingXcelsius– Onlyafewusersneedtheknowledgetodesignreports

•  Drawbacks–  Two‐waydatastorage(filesystem&database)–  Some6messtabilityproblems–  Time‐intensiveadministra6onandmaintenance–  Expensive

48CarmenBarandela:"DataWarehousesandNoSQL”

ManagementDashboards

49

DesignedlocallyusingMSOfficeandXcelsius.

DatacomesfromtheMDLdatawarehouse.

PublishedasFlashtotheBOServer.

CarmenBarandela:"DataWarehousesandNoSQL”

Remember:

•  Highdatavolumes+analysis=datawarehouse

•  OLTPvs.OLAP•  Usethefacili6esthetoolprovides– Materializedviews,snapshots,pipelinedfunc6ons

•  Keepthingsextensibleandsimple!

•  Par66onsareveryhelpful

50CarmenBarandela:"DataWarehousesandNoSQL”

Agenda•  DataWarehousesinAdministra6veCompu6ng

•  Recap:DataWarehousesTheory

•  DataWarehousesandInforma6onSystemsinAIS–  Founda6on,HRandFIInforma6onSystems

–  ComplexDataExtrac6onProcesses–  Pixel‐PerfectRepor6ng– Dashboards

•  NoSQL–  ...

51CarmenBarandela:"DataWarehousesandNoSQL”

NoSQL

52CarmenBarandela:"DataWarehousesandNoSQL”

NotOnlySQL

WhyNoSQL?(1)Tradi6onalRDBMSsystems:‐Defini6onofarigidschema

‐GuaranteesACIDtransac6ons:

.Atomicity

.Consistency

.Isola6on

.Durability

CarmenBarandela:"DataWarehousesandNoSQL” 53

WhyNoSQL?(1)Tradi6onalRDBMSsystems:‐Defini6onofarigidschema

‐GuaranteesACIDproper6es:

.Atomicity

.Consistency

.Isola6on

.Durability

Scalability

CarmenBarandela:"DataWarehousesandNoSQL” 54

WhyNoSQL?(2)‐  Newwebappsdifferentneeds:.Scalability&elas6city(atlowcost!).Highavailability.Flexibleschemas

.Geographicdistribu6on

‐  Donotnecessarilyneed:.Strongconsistency/integrity.Complexqueries

CarmenBarandela:"DataWarehousesandNoSQL” 55

NoSQLusecases1. Massivedatavolumes.Massivelydistributedarchitecture.Google,Amazon,Yahoo,Facebook...

2.  Extremequeryworkload.Impossibletoefficientlydojoinsatthatscalewith

anRDBMS

3.  Schemaevolu6on.Schemaflexibilityisnottrivialatlargescale.Schemachangescanbegraduallyintroducedwith

NoSQL.

CarmenBarandela:"DataWarehousesandNoSQL” 56

NoSQLTaxonomy

•  Keyvaluestores•  Documentdatabases

•  Column‐orienteddatabases

•  Graphdatabases

CarmenBarandela:"DataWarehousesandNoSQL” 57

NoSQLTaxonomyKeyvaluestores

‐1key(uniqueid)‐>1value(binaryobject,blob)

‐TheDBdoesnotunderstandit:nodatamodel

‐Veryfast

‐Examples:AmazonDynamo,MemcacheDB

CarmenBarandela:"DataWarehousesandNoSQL” 58

NoSQLTaxonomyDocumentdatabases

‐  Key‐valuestore:thevalueisusuallystructured‐  Queryingwithmorethanakeyispossible

‐  Examples:AmazonSimpleDB,CouchDB...

CarmenBarandela:"DataWarehousesandNoSQL” 59

NoSQLTaxonomyColumn‐orienteddatabases

‐Distributed,persistentmul6dimensionalsortedmap.

‐Mapindexedby:rowkey,columnkeyanda6mestamp

‐Examples:GoogleBigTable,Cassandra(Facebook)

CarmenBarandela:"DataWarehousesandNoSQL” 60

Googlepaper:hnp://sta6c.googleusercontent.com/external_content/untrusted_dlcp/labs.google.com/en//papers/bigtable‐osdi06.pdf

NoSQLTaxonomyGraphdatabases

‐  Basedonthegraphtheory:‐Ver6ces:likeen66es

‐Edges:rela6onshipsbetweentheen66es

‐  Examples:Neo4j,FlockDB(Twiner)

CarmenBarandela:"DataWarehousesandNoSQL” 61

CAPTheorem

CarmenBarandela:"DataWarehousesandNoSQL” 62

Consistency Availability

ParOOonTolerance

CAPTheorem

CarmenBarandela:"DataWarehousesandNoSQL” 63

Consistency Availability

ParOOonTolerance

Only2canbesaOsfiedattheOme

NoConsistency?

CarmenBarandela:"DataWarehousesandNoSQL”

NoConsistency?

CarmenBarandela:"DataWarehousesandNoSQL”

EventualConsistency!!!

EventualConsistency

Thestoragesystemguaranteesthatifnonewupdatesaremadetotheobjecteventuallyallaccesseswillreturnthelastupdatedvalue.

CarmenBarandela:"DataWarehousesandNoSQL” 66

RDBMSvsNOSQL

•  StrongConsistency•  BigDatasets•  Scalingispossible•  SQL•  Goodavailability•  Consolidatedtech.

•  EventualConsistency•  HugeDatasets•  Scalingiseasy•  API•  Highavailability•  S6llimmaturetech.

CarmenBarandela:"DataWarehousesandNoSQL” 67

Спасибо!•  Ques6ons?•  MoreInfo– barandel@cern.ch

CarmenBarandela:"DataWarehousesandNoSQL”