42
Data Modeling’s Role in the Evolving World of Data Management Donna Burbank CA Technologies

Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

Embed Size (px)

Citation preview

Page 1: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

Data Modeling’s Role in the Evolving World of Data Management

Donna BurbankCA Technologies

Page 2: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

PAGE 2

Who am I?

• More than more than 15 years of experience in the areas of data management, metadata management, and enterprise architecture. – Currently is the Senior Director of Product Marketing for CA’s

data modeling solutions. – Brand Strategy and Product Management roles at Computer

Associates and Embarcadero Technologies – Senior consultant for PLATINUM technology’s information

management consulting division in both the U.S. and Europe. – Worked with dozens of Fortune 500 companies worldwide in the

U.S., Europe, Asia, and Africa and speaks regularly at industry conferences.

– Co-author of several books including:• Data Modeling for the Business• Data Modeling Made Simple with CA ERwin Data Modeler r8

Page 3: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

PAGE 3

Agenda

• Culture shift

• New Technologies and Methodologies

– Big Data

– Cloud Computing

– Data Vaults

– Tagging

– Agile

• How Do Traditional Data Management Practices Fit?

Page 4: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

PAGE 4

Technological and Culture Shift in Society

1960 1970 1980 1990 2000 2010

Page 5: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

PAGE 5

Technological and Culture Shift in Data Management

1960 1970 1980 1990 2000 2010

• Mainframes• Flat Files• In-House Development• Waterfall Methodology

• Individual PC’s• Relational Databases• Democratization of Computing

• Client/Server Computing• Relational Databases• Data Warehousing• Packaged Applications• RAD Development

• Dot COM Revolution• Changing Business Models

• Dot COM Bubble Bursts • Data Integration• “Do More with Less”

• Cloud Computing• “Big Data”, NoSQL• Data Vaults• Agile Development

Page 6: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

PAGE 6

Paradigm Shift

Traditional

• Top-Down, Hierarchical• “Design, then Implement”• “Passive”, Push technology• “Manageable” volumes of information• “Stable” rate of change

Modern

• Distributed, Democratic• Real-time Analysis• Collaborative, Interactive• Massive volumes of information• Rapid and Exponential rate of change

Page 7: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

PAGE 7

Emergence

In philosophy, systems theory, science, and art, emergenceis the way complex systems and patterns arise out of a

multiplicity of relatively simple interactions. - Wikipedia

I love my new Levis jeans.

Is Levi coming to the party?

Sale #LEVIS 20% at Macys.

LOL. TTYL. Leving soon.

Page 8: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

Relational Databases vs.

“Big Data” and NoSQL

Page 9: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

PAGE 9

“Big Data” Survey

• How familiar are you with "Big Data" technologies?

A. Very familiar. We are using Big Data technologies now.

B. Somewhat. I know what it is, but am not using it.

C. Not at all. This is new to me.

Page 10: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

PAGE 10

Relational Databases vs. “Big Data” and NoSQL

• With the rise of massive volume, real-time data streams, traditional relational database technologies cannot scale.– Google, Yahoo, Facebook, Twitter, MySpace

– Corporate Data: email logs, click stream data, medical diagnostics research, etc.

– Huge volumes: With Google, for example, you’re basically “ingesting” the entire internet.

• Issues that arise with Big Data– Can’t store in a single, relational database Too Big

– Can’t easily index Too Big

– Can’t write traditional SQL Unstructured

– Can’t design data structures “top-down” Data is constantly in flux and in creation.

Page 11: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

PAGE 11

Relational Databases vs. “Big Data” and NoSQL

• Big Data Solutions address these issues by:– Distributed data storage across servers

– Stores data in files

– Post-creation analysis, real-time

– Data is structured as it is accessed

– Natural-language, application-level processing

– Distributed hash tables – pull record by name, similar to application programming

Page 12: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

PAGE 12

Do Relational Databases Go Away?

• What Relational Databases are good for:– Structured data, that can be designed “up front” with predictable queries

– The same reasons you always have, actually:

• Ease of query and retrieval

• Ease of update - Reducing redundancy

• Increase data quality

• What “Big Data” solutions are good for:– Analyzing raw, real-time data BEFORE it goes into a Data Warehouse or Transactional Database

– Analyzing large volumes of unstructured data that is updated in real-time

I love my new Levis jeans.

Is Levi coming to the party?

Sale #LEVIS 20% at Macys.

LOL. TTYL. Leving soon.

Customer Sentiment Database

Page 13: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

PAGE 13

Some Technologies + Terms for Big Data

• Big Data Software Frameworks:– Hadoop: An open source Apache project

– MapReduce: Google’s solution

• Memcache– Memory caching system for large-scale data-driven websites

– Big data’s equivalent of MySQL behind a web site

– With large data sets, can’t run a query each time:

• Memory layer between database and web server

• Fetches data on-demand

• MapReduce– MapReduce is the key algorithm that Hadoop uses to analyze data across servers

– Programmatic, not SQL-based

– A map transform is provided to transform an input data row of key and value to an output key/value: map(key1,value) -> list<key2,value2>

– Key aspect is that every transform is independent and the operation can be run in parallel on lists of data distributed across servers

Page 14: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

PAGE 14

Some Technologies + Terms for Big Data

• Hive– Data warehouse infrastructure which allows SQL-like ad-hoc querying of data (in any format)

stored in Hadoop

– Not designed for online transaction processing and does not offer real-time queries and row level updates. It is best used for batch jobs over large sets of immutable data (like web logs).

• Hbase (Apache), BigTable (Google)– “Database” for Big Data

– Not relational, distributed multi-dimensional sorted map

Page 15: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

On Premisevs.

Cloud

Page 16: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

PAGE 16

Cloud Survey

• How familiar are you with Cloud Database technologies?

A. Very familiar. We are storing data in the cloud now.

B. Somewhat. I know what they are, but am not using them.

C. Not at all. This is new to me.

Page 17: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

PAGE 17

On-Premise vs. Cloud

• Historically, corporate data has been stored on-premise

– Originally centralized

– Then distributed

– But still managed and owned by the company – directly purchasing hardware and software

• With the emergence of Cloud Computing, organizations can “lease” database stored as they need it

– Fewer in-house resources needed

– Ability to expand and contract usage as needed

• Driving Innovation

– With Cloud services, small, start-up companies have lower cost of entry

Page 18: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

PAGE 18

In the Cloud Realm IT is ‘rented’ on a subscription basis

• SaaS: Software-as-a-Service

– On-Demand Software Delivery

– Examples: Web mail such as Yahoo or Google, ERP and CRP systems such as Salesforce.com

• PaaS: Platform-as-a-Service

– Development platform and solution stack as a service.

– Rent hardware, operating systems, storage and network capacity over the Internet.

– Examples: Force.com, Google

• IaaS: Infrastructure-as-a-Service

– Delivery of the computing infrastructure (hardware) as a fully outsourced service.

– Examples: Google, IBM, Amazon.com etc.

• DaaS: Database-as-a-Service

– Provides traditional database features, typically data definition, storage and retrieval, on a subscription basis over the web.

– Examples: Microsoft SQL Azure, database.com, Amazon SimpleDB

Page 19: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

PAGE 19

The Precedent Exists

• How many households or companies make their own electricity today?

• Instead, purchase “Electricity As-a-Service” (EaaS)

Page 20: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

PAGE 20

How do Traditional Data Models Fit?

• A Data Model is your “roadmap”for:

– What data to move to the cloud, and what to keep on-premise

– Defining data structures (physical model) and business requirements (logical model) for cloud databases

• Off-Premise doesn’t need to mean Out of your Control

MySQL

DB2

Teradata

SQL Server Sybase

Oracle

MSSQL Azure

Page 21: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

Data Warehousesvs.

Data Vaults

Page 22: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

PAGE 22

Data Vault Survey

• How familiar are you with Data Vault technology?

A. Very familiar. We are using Data Vaults now.

B. Somewhat. I know what it is, but am not using it.

C. Not at all. This is new to me.

Page 23: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

PAGE 23

Data Warehouses vs. Data Vaults

• The Traditional Data Warehouse

– Based on up-front analysis & design

– Creation of central data model and database

– Data is cleansed and transformed through ETL to central store

• Data Vault's philosophy is that all data is relevant data, even if it is "wrong“

– Based on raw data sets

– Use of Links: Everything is a many-to-many relationship (no RI)

– Complete Auditability and Traceability (as a result of Sarbanes-Oxley, Basel II, etc.)

– Enable re-creation of a source system based on a specific point in time

– Claim is that traditional modeling is inflexible:

• When business rules change, the cardinality & model need to change –> also cascades to child entities

Page 24: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

PAGE 24

Data Vaults

• Based on Parallel Processing

– Split the work so that database engines can optimize

• Index coverage

• Data redundancy

• Parallel query

• Resource utilization

– Each table has its own I/O channel associated with it

• Hypercubes• Links and Hubs: Hubs provide the keys. Links describe the keys

• Using a weighted graph helps you determine which nodes are most important

• Problem with existing ETL is that everything needs to be done at load-time not always realistic

• “Divide and Conquer—do transformations, etc. when load individual marts

Page 25: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

PAGE 25

Is Data Modeling Still Relevant? – Yes

Source: www.danlinstedt.com

Page 26: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

PAGE 26

Is Data Modeling Still Relevant? – Yes

Source: www.danlinstedt.com

Page 27: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

PAGE 27

Is the Traditional Modeling Process Still Relevant? - No

• No “up-front” design

• Instead, analyzing large volumes of data in “real time”

– No up-front design, analyzing raw data “as is” to discover patterns

– Distributed processing

• Sound similar to “Big Data”? It is

Page 28: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

Hierarchies vs.

Hash Tags

Page 29: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

PAGE 29

Traditional Naming Standards

• In traditional data modeling and management, naming standards are prescribed, enforced, and structured

Page 30: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

PAGE 30

A New Paradigm - Tagging

Page 31: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

PAGE 31

Error-Free?

Page 32: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

PAGE 32

What works?

• Traditional Naming Standards, Data Modeling Best Practices are good for up-front design and data quality– Clear naming rules, e.g. Womans Hiking Boot (modifier, prime word, class word)

– Clear definitions, e.g. “A boot is an item of footwear with close toes, which covers the calf, often with a raised heel.”

– Supertypes and subtypes

– Data cleansing (boat or boot?)

• But Remember Emergence– With the volumes of real-time data, it’s impossible to model it up-front first

– Overtime, data quality improves through self-correction and “emergence”

– After all, generally, Google searches are pretty good.

• As with the other examples, the organizational reality will be a mix of technologies.

Page 33: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

Waterfall vs.

Agile

Page 34: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

PAGE 34

Agile Survey

• How familiar are you with Agile methodologies?

A. Very familiar. We are using Agile now.

B. Somewhat. I know what it is, but am not using it.

C. Not at all. This is new to me.

Page 35: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

PAGE 35

What is Agile?

Agile Software Development is a group of software

development methodologies based on iterative and

incremental development, where requirements and solutions evolve through

collaboration between self-organizing, cross-functional

teams. - Wikipedia

Page 36: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

PAGE 36

Waterfall vs. Agile

Waterfall

• Up-Front Design• Often Long Development

Cycles•Rigorous analysis of business

requirements• Documentation• “Top Down” management,

e.g. data standards team

Agile

• Little up-front design• Short Development Cycles

(sprints)•“Learn as you go”

requirements• Little documentation “The

documentation is the code”• Team-focused, all contribute•Traditional application-

development focused (i.e. not data)

Page 37: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

PAGE 37

Recommendation: Be more “Agile”, but not necessarily using 100% of the Agile Methodology

• Don’t Boil the Ocean

– Focus on small, manageable project that will show incremental value

– The days of the one year development cycle to build an enterprise model are gone

• Involve all Stakeholders

– All contribute: Engage in frequent conversations with business users, developers, DBAs, etc.

– The days of “the data standards team will tell you the rules” are gone (if they ever existed!)

• Focus on the Business Requirements—key to any methodology

– The goal is to support the organization’s needs, not implement a particular technology or methodology

– Do document requirements. A conceptual data model, for example, can apply to any technology

Page 38: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

PAGE 38

A Data Model is a Communication Tool:Data Model as Mediator

• A data model is a valuable tool to communicate core requirements across technologies to a wide range of stakeholders.

SAP

DB2

DB2Oracle

Teradata

IDMS

SQL

Server

Data Model

Developers Business

Sponsors

Data Architects

ETC!ETC!

DBAsMSSQL

Azure

Page 39: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

How has the Social and IT Culture Evolved?

Page 40: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

PAGE 40

Paradigm Shift

Traditional

• Top-Down, Hierarchical• “Design, then Implement”• “Passive”, Push technology• “Manageable” volumes of information• “Stable” rate of change

Modern

• Distributed, Democratic• Real-time Analysis• Collaborative, Interactive• Massive volumes of information• Rapid and Exponential rate of change

Page 41: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

PAGE 41

Summary

• We are in a period of “disruptive” technology– Rapid rate of change

– Massive volumes of data

– Social changes: more participatory, engaged

– Distributed, web-based implementations

– Real-time analysis required

• But, as with any Age of Change, the basics still apply– Relational databases are still relevant

– Data Models are valuable to document business requirements and technical implementation

– The “hard stuff” still needs to be done: analysis, data quality, etc.

• Need to Adapt and Integrate– Utilize new technologies as appropriate

– Engage extended teams in a participatory manner

– Be more “agile” and adaptive in your projects and methodologies

• Have fun! This is an exciting time to be in Data Management

Page 42: Data Modeling’s Role in - DAMA Indiana Modelings Role in Evolving... · Data Modeling’s Role in ... •With the rise of massive volume, real-time data streams, ... – “Database”

Questions?

Email: [email protected]: @donnaburbankwww.erwin.com