28
Understanding Big Data Introduction

Understanding Big Data Introduction. Information has always been a crucial resource for decision making. The lack of information in a subject can lead

Embed Size (px)

Citation preview

Understanding Big Data

Introduction

Introduction

Information has always been a crucial resource for decision making. The lack of information in a subject can lead to

mistakes, sometimes catastrophe. Research and marketing firms are designed to explore and create new

information in areas shadowed in mystery or enlightened by countless, often contradictory, observations.

Big Data is the new push into gathering, storing, analyzing, and creating value from data: much of which could not be explored until recently due to technological limitations.

This toolkit is designed to introduce Big Data concepts and provide the tools to create a Big Data culture in your

organization.

Defining Big Data

While Big Data is a relatively new term in the industry, the concepts and principles are not new. They are built on data warehousing and data analysis concepts.

Big Data as a Thing – Big Data refers to data sets which are so large or complex, they cannot be processed or analyzed without the use of machines, such as images or voice recordings. (Wikipedia)

Big Data as a Paradigm – Big Data refers to the exponential growth, availability, and use of structured and unstructured data. (SAS)

Big Data as a Technology – Big Data is a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis. (IDC)

Facts about Data

To understand the scope of Big Data, one must first understand a few facts about data in general. Most of these facts are from studies by the International Data Corporation.

Existing Data in the world is estimated at 1.2

zettabytes; that’s 1.2 million

gigabytes of data.

Source: IDC 2011 Study

Facts about Data

2.5 quintillion bytes of data are being created

every day.

Source: IBM

Facts about Data

The average file size

containing data is

decreasing.

Source: 2012 IDC Study

The number of files

containing data is

increasing.

Facts about Data

75% of data is generated

by the consumer.

Source: 2012 IDC Study

25% of data is generated

by the enterprise.

Enterprises are LIABLE for 80% of all data generated.

Facts about Data

The data about a person far exceeds the data created by a person.

Source: IDC 2011 Study

Facts about Data

25% of all data is

unique or original.

Source: 2012 IDC Study

75% of all data is a

duplication of original data.

Facts about Data

The most impressive companies today are leading in capturing and managing value

from data.

Dimensions of Big Data

Several companies have attempted to summarize the major areas of concerns or dimensions relevant to Big Data. Most build on those defined in the IDC definition:

– Volume – Velocity– Variety– Variability– Complexity– Value– Trust (Veracity)

Big Data Dimension - Volume

The most undeniable facet of Big Data is the volume of data that an enterprise must deal with—the data created by its internal operations, data from transactions with customers and suppliers, data from external consultants, partners, and regulators, and general industry data about competitors and market places.

The development of Big Data within the enterprise focuses on finding value at the right time and place from any number of data ‘haystacks’.

Big Data Dimension - Velocity

In the last slide, we alluded to an old saying… “finding a needle in a haystack”.

A challenge to data and information has always been related to time: having the right information in the right place at the right time. The problem has been that finding the right data in enormous mounds of data can be time-consuming. For example, mapping the human genome took years. The opportunity for Big Data is the ability to search through not just one, but multiple haystacks in a matter of minutes.

Big Data Dimension - Variety

Data exists in many forms: text, images, video, audio, documents, databases, etc. Many tools, past and present, have been designed to search through and analyze data in different formats, but only a relative few are effective in supporting all formats. And these few tools may not support any new formats in the future.

Big Data architecture allows for any combination of tools and technologies to be used to gather, store, analyze, and manage multiple data formats into and across the enterprise.

Big Data Dimension - Variability

SAS acknowledges that data generation, analysis, and use are not a constant stream of activities, but has ebbs and flows based on seasons and demand. For Big Data, this change requires planning and monitoring across the enterprise to ensure proper levels of capacity for all activities.

Additionally, variability also covers the unpredictability of demand on data. One day, a person may need data on one topic; and the next on another topic; and on the third day, on a completely different topic.

Big Data Dimension - Complexity

To support the variability of data for a single person, often, one must search through multiple sources (haystacks)—some of which are external to the enterprise. Through this process, links, connections, and relationships are made.

While these associations may be driven manually by the person, many big data solutions attempt to create these associations automatically. For example, the leader in Big Data, Google Search, attempts to provide the most relevant data about a topic in declining order.

The establishment of these relationships can be complex for one person; the addition of another person increases the complexity exponentially.

Big Data Dimension - Value

What information is important and what information is not?

This is a crucial and often misunderstood question for enterprises because most managers will make the determination based on their requirements, interests, and timing. Unfortunately, what is important to you may not be important to the next guy; what is important to you today may not be important to you tomorrow.

Big Data Analytics attempts to determine the value of information over its lifecycle and for different population groups or communities.

Big Data Dimension - Trust

IBM refers to this dimension as veracity and simply describes it as the trust decision-makers place in the data they have available to them. Surprisingly, 1 in 3 business leaders do not trust the available information. As the volume, variety, variability, and complexity of data continues to grow, so will the level of distrust if not managed appropriately.

For enterprises reliant on their ability to use data effectively, one mishap may devastate their trust in available data; or worse, the trust of their customers in the enterprise.

Big Data Capabilities

Capabilities – Traditional

Big Data builds on traditional methods for moving, processing, and searching through data. As the demand on data increases, these traditional methods will often prove insufficient and slow.

Capabilities – Fast Data

One of the first steps away from traditional methods, often by adapting the traditional tools, is to increase the speed of data activities. Fast Data techniques focuses on developing the enterprise’s ability to process the majority of data in a relatively short time for the purposes of responding quickly to the situation generating the data.

For instance, responding to a security incident identified after processing 1,000 unauthorized hits a second over the period of you reading this slide.

Capabilities – Big Analytics

The focus of Big Analytics is to turn information into knowledge through a combination of older and newer approaches to create smart information management systems. The purpose is to enable a machine to identify hidden trends, patterns, and differences which previously could only be seen by humans.

Capabilities – Deep Insight

The premise behind Deep Insight is the culmination of all Big Data efforts for an enterprise—to provide useful and relevant information to achieve a specific result, purpose, or goal.

The journey to a result requires effective handling of that which is known and that which is unknown.

Knowledge

"He that knows not,and knows not that he knows notis a fool.Shun him

He that knows not,and knows that he knows notis a pupil.Teach him.

He that knows,and knows not that he knowsis asleep.Wake him.

He that knows,and knows that he knowsis a teacher.Follow him."

(Arabic proverb)

NEIGHBOUR R (1992) The Inner Apprentice London; Kluwer Academic Publishers. p.xvii

The Role of Big Data

To process, analyze, and draw conclusions from massive amounts of data about perceivable and hidden trends, patterns, and differences across

defined categories, space, and time

How does Big Data fulfill this role?

The Components of Big Data

Establishing Policies

Identifying Data Sources

Implementing Storage Solutions

Improving Data Transfer Capabilities

Developing Analytic Capabilities

Exploring Data Visualization Concepts

Creating a Data-Driven Organization

The Toolkit

The Toolkit is designed to be holistic and somewhat comprehensive to Big Data. The technologies are too broad and diverse to be covered in a single toolkit. In addition, many organizations will already have a substantial foundation in one contributing technology to Big Data while struggling with another technology.

The goal of the Big Data Toolkit is to define the contributing factors, major components, and their relationships, while providing the basic tools to take action based on the organization’s needs.

Moving Forward

The presentations found within the Toolkit provide education about the different facets of Big Data. They can be used for self-edification or as the foundation for presenting a case to different levels of the organization.

The process document, Developing Big Data Solutions, is intended to be a step by step guide in creating Big Data foundation in your organizations. Multiple templates have been created to support the process and to aid organizations in their efforts to improve their Big Data capabilities.