34
THE DAWN OF BIG DATA New Rules; New Structures Neal J. Hannon University of Kansas February 9, 2012

The dawn of big data

Embed Size (px)

DESCRIPTION

Big Data basics

Citation preview

Page 1: The dawn of big data

THE DAWN OF BIG DATA

New Rules; New Structures

Neal J. HannonUniversity of KansasFebruary 9, 2012

Page 3: The dawn of big data

Definition

• Big data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target currently ranging from a few dozen terabytes to many petabytes of data in a single data set.

Page 4: The dawn of big data
Page 5: The dawn of big data

More Data Please…

Page 6: The dawn of big data

• In a 2001 research report[14] and related conference presentations, then META Group (now Gartner) analyst, Doug Laney, defined data growth challenges (and opportunities) as being three-dimensional, i.e. increasing volume (amount of data), velocity (speed of data in/out), and variety (range of data types, sources). Gartner continues to use this model for describing big data.[15]

Page 7: The dawn of big data

Gartner• Worldwide information volume is growing

annually at a minimum rate of 59 percent annually, and while volume is a significant challenge in managing big data, business and IT leaders must focus on information volume, variety and velocity.• Volume• Variety • Velocity

Page 8: The dawn of big data

Volume

• Volume: The increase in data volumes within enterprise systems is caused by transaction volumes and other traditional data types, as well as by new types of data. Too much volume is a storage issue, but too much data is also a massive analysis issue.

Page 9: The dawn of big data

Variety• Variety: IT leaders have always had an issue

translating large volumes of transactional information into decisions — now there are more types of information to analyze — mainly coming from social media and mobile (context-aware). Variety includes tabular data (databases), hierarchical data, documents, e-mail, metering data, video, still images, audio, stock ticker data, financial transactions and more.

Page 10: The dawn of big data

Velocity• Velocity: This involves streams of data,

structured record creation, and availability for access and delivery. Velocity means both how fast data is being produced and how fast the data must be processed to meet demand.

Page 11: The dawn of big data

Data is becoming the new raw material of business: an economic input almost on a par with capital and labour. “Every day I wake up and ask, ‘how can I flow data better, manage data better, analyse data better?” says Rollin Ford, the CIO of Wal-Mart.

Source: Data, Data Everywhere, The Economist, February 25, 2010

There were 5 exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days, and the pace is increasing

Eric Schmidt, Google CEO, Techonomy Conference, August 4, 2010

Why now?

Page 12: The dawn of big data

Source: Mike Driscoll, CTO Metamarkets: The Three Sexy Skills of Data Scientists (& Data Driven Startups)

Page 13: The dawn of big data

Source: Mike Driscoll, CTO Metamarkets: The Three Sexy Skills of Data Scientists (& Data Driven Startups)

Page 14: The dawn of big data

Large• Billions of

web clicks+1TB

• Millions of web pages

10GB-1TB

• Thousands of Sales figures

<10GB

Page 15: The dawn of big data
Page 16: The dawn of big data

Real-time

Page 17: The dawn of big data

How can big data create value?

 • Creating transparency – enabling, for example,

the manufacturing sector to integrate “data from R&D, engineering, and manufacturing units to enable concurrent engineering ... (to) significantly cut time to market and improve quality.” This seems much like traditional data warehousing.

Page 18: The dawn of big data

How can Big Data create value?• Enabling experimentation – “organizations can

collect more accurate and detailed performance data ... to instrument processes and then set up controlled experiments … (which) can enable leaders to manage performance at higher levels.” Super-crunching equals analytics + experiments.

Page 19: The dawn of big data

How can Big Data create value?• Innovating new business models – “The

emergence of real-time location data has created an entirely new set of location-based services from navigation to pricing property and casualty insurance based on where, and how, people drive their cars.” This affirms Mike Loukides' assertion “that data science enables the creation of data products.”  

Page 20: The dawn of big data

How can Big Data create value?• Supporting human decision making with

automated algorithms – “decision making may never be the same; some organizations are already making better decisions by analyzing entire datasets from customers, employees, or even sensors embedded in products.” The statistical learning world continues to progress.

Page 21: The dawn of big data

SAS - unstructured text

• http://www.youtube.com/user/SASsoftware?v=NHAq8jG4FX4&feature=pyv&ad=8557352196&kw=data%20analytics

Page 22: The dawn of big data

Pattern Based Strategy• "The ability to manage extreme data will be a core competency of enterprises that

are increasingly using new forms of information — such as text, social and context — to look for patterns that support business decisions in what we call Pattern-Based Strategy," said Yvonne Genovese, vice president and distinguished analyst at Gartner. "Pattern-Based Strategy, as an engine of change, utilizes all the dimensions in its pattern-seeking process. It then provides the basis of the modeling for new business solutions, which allows the business to adapt. The seek-model-and-adapt cycle can then be completed in various mediums, such as social computing analysis or context-aware computing engines."

Page 25: The dawn of big data

Tricks of the Trade

• New Architecture

• In Memory Analytics

Page 26: The dawn of big data
Page 27: The dawn of big data
Page 28: The dawn of big data
Page 29: The dawn of big data
Page 30: The dawn of big data

In-Memory Indexing at SAP• We have also got enterprise search time, we really started doing that back in

2003/2004 time period, that’s also when we started coming out with business warehouse accelerator that was when Google was just really starting to become Google, and we tried to do the same thing with enterprise data that Google does with website data as far as indexing it. So we also put the indexes in memory, so its speeded up even further and you know now if you actually look at HANA really is kind of the next evolutionary step in that that chain. This is in-memory process and this isn’t something just for a specialist. It really is a technology that’s matured to a level that it can run the entire business suite and run your entire company in-memory and get all those benefits for everything.

• http://docs.media.bitpipe.com/io_10x/io_102428/item_477005/The%20Next%20Chapter%20of%20In-Memory%20Computing_PT_12.22.11.pdf

Page 31: The dawn of big data
Page 32: The dawn of big data
Page 33: The dawn of big data

For more on HADOOP• http://www.slideshare.net/PhilippeJulio/hadoop-

architecture

Page 34: The dawn of big data

Obligatory Questions slide

• Any Questions?