Telco Big Data Workshop Sample

Introduction to Big Data and Real Time Analytics Workshop

Telco Big Data & Real Time Analytics Summit 2012

3-5 December 2012, London

www.alanquayle.com/blog

© 2012 Alan Quayle Business and Service Development 1

"There are three kinds of lies:

lies, damned lies, and statistics."

British Prime Minister Benjamin Disraeli (1804–1881), or perhaps

Samuel Langhorne Clemens (1835 – 1910) better known as Mark Twain


Never Forget This!


People

Process

Technology

Most projects fail here


The Data Tsunami!

Why are we measuring so many things?

• Atoms vibrate at about 10^13 Hz, assuming we only measure the atom and not the

subatomic constituents to the resolution of only 1 byte, that’s 10TB per second

• Now there are rough 7*10^27 atoms in the human body

• So just monitoring one human body’s atoms will generate 7*10^40 bytes per second.

• That’s 2*10^48 bytes in a year, that’s 2 yotta yotta bytes

• By 2020, the quantity of electronically stored data will reach 35 trillion gigabytes,

that’s only 35*10^21

• Its easy (fun) to play with numbers! Lies, damned lies and statistics!

• We do not need to measure each revolution of an airplane’s turbine, only when an

event (out of tolerance) occurs does it matter.

o Events and collecting what matters, NOT collecting everything all the time!

o How do we know what matters? Common sense, knowing your business and experimentation!



Beware the “Bait and Switch”

© 2012 Alan Quayle Business and Service Development 7 Data You Need Lots of It!!


But There’s a Shortage of Data Scientists to Do Anything With It


So Give Me All Your Money

Introduction

• The purpose of this one day workshop is to provide both an introduction and pragmatic insight

into Big Data, Data Science and Real-Time Analytics.

• This course will provide a frank and objective review of the state of the art and the market.

Examining what is working in practice and what is not through an extensive series of case studies.

• Big data usually includes data sets with sizes beyond the ability of commonly used software tools

to capture, manage, and process the data.

o Big data sizes are a constantly moving target, as of 2012 ranging from a few dozen terabytes to many

petabytes.

o A new platform of "big data" tools has arisen to handle sense-making over large quantities of data, for

example the Apache Hadoop Big Data Platform.

• Analyzing large data sets in near real-time is not new, business intelligence is as old as business

itself (that is as old as human society).

o IT automated it, and enabled an organization to own it rather than in the wet-ware of a few human

brains (generally the owners of a business.)

o Some real-time analysis results in automated triggers, so called machine learning, most analysis still

requires human interpretation which is not straight forward.

o Analysis of such large and mixed data sources has its own problems, as we’ll discuss in the course.

o Privacy and regulation cannot be ignored, for some industries this will limit the application of Big

Data.


Structure Part 1 of 5

• 09:00 Registration

• 09:30 History and Overview: Understanding Big Data and Real-Time Analytics in

Context

• What do we mean by Big Data?

• Why does Big Data matter?

• Big Data Maturity

• The 3Vs” Volume, Variety and Velocity

• What are the Domains of Big Data?

• Big Data Technologies

• What Enterprises Think of Big Data

• How Enterprise Verticals are Impacted by Big Data

• Why Now?

• Key Trends driving towards Big Data

• 10:45 Coffee Break


• History of Big Data

• Taxonomy of Big Data Companies

• Big Data Landscape

• List of Companies in Big Data (and their Big

Data revenues)

• Big Data Market Sizing

• Telecoms and Real-Time

• O2 More: Proof we can do it!


• 11:00 Quick Technology Review: Diving into a little detail on a few of the key technologies

(only as deep as the architecture) to understand their history and capabilities /

limitations

• Hadoop

o What is Hadoop?

o Ecosystem

o History

o Design Axioms

o Hadoop Distributed File System

o MapReduce: Distributed Processing

o Architecture

o Data Schemas

o Query Language Flexibility

o Economics

o Case Studies

• Hadoop and Hbase in the Cloud (Amazon)

• NoSQL and Cassandra + some use cases

• Hbase versus Cassandra

• Graph Database introduction



• 12:00 & 14:00 Application of Big Data

• Hardware and Software Trends

o Execution and Results Characteristics

o Framework: Ecosystem, Application Services, Data

Management

• Real-Time Analytics

o Use Cases

o Extended RDMS versus MapReduce / Hadoop

o Requirements, Trends, People and Organization

Issues, Outlook

• Big Data and the Cloud

o Why the Cloud and Big Data?

o Cloud benefits

o Use Cases: Bankinter, Etsy, Razorfish

• 13:00ish Lunch


• The Social Enterprise

o Business Benefits

o ALU example

o Drivers

o Social + Data Analysis = Business

intelligence

o AT&T Case Study

o Lessons Learned

• Telcos and Big Data

o TMF Survey

o Big Data Framework

o Predictive / Adaptive Analytics

o Decision Engineering

o The Problem with Telecom

• Telco Analytics

o Customer Profiling

o Next Product Tools

o Marketing Mix Modeling

o Cost of Acquisition Tools

o Case Study


• 15:00 Ecosystem, Taxonomies and

Suppliers: Understanding the many

suppliers, technology camps, and

approaches



• Cloudera

• Autonomy

• Vertica

• InfoChimps

• Guavas

• Matrix


• Case Studies

• Real Time Analytics for Big Data Lessons from

Facebook

o Quick technology review

o Facebook Real-time Analytics System

o Goal

o Actual Analytics

o Solution

o Memory, Collocate, Economics


Twitter

o Requirements

o Actual Analytics

o Challenges

o Performance

o One data any API

o Solution


• Other Case Studies

• Orbitz, Hertz, Yelp


• 16:00 Global Enterprise and Telecom Survey on Big Data and Real-Time

Analytics

• Background

• The Questions

• The Importance of Analytics

• Impact of Big Data on Analytics

• Size of Data Sets, Number of Data Sources

• Update Frequency

• Integration of Data Sources

• Data Set Responsibility

• Types of Data, Types of Processing and Analytics

• Challenges

• Big Data Analytics Platforms

• Benefits and Plans

• Data Analytics Storage and IT Infrastructure Requirements

• Increasing Interest in Hadoop MapReduce Framework Technology

• Conclusions

• Recommendations and Wrap Up


Alan Quayle

• 22 years of experience in the telecommunication industry, focused on developing

profitable new businesses in service providers, suppliers and start-ups.

• Customers include

o Operators such as AT&T, BT, Charter, Etisalat, M1, O2, Rogers, Swisscom, T-Mobile,

Telstra, Time Warner Cable, Verizon and Vodafone;

o Suppliers such as Adobe, Alcatel-Lucent, Ericsson, Huawei, Nokia Siemens Networks,

and Oracle; and

o Innovative start-ups such as Apigee, AppTrigger (sold to Metaswitch), Camiant (sold to

Tekelec), OpenCloud, and Voxeo.

• Work with the developer community and on the board of developers such as

GotoCamera, hSenid Mobile, as well as suppliers such as Sigma Systems.

• Weblog www.alanquayle.com/blog

• Linkedin http://www.linkedin.com/in/alanquayle


http://www.alanquayle.com/blog

http://www.linkedin.com/in/alanquayle

http://www.linkedin.com/in/alanquayle

A Thank You to Those helping me Put this Course Together

• In putting this workshop together I’d like to thank the following

suppliers for their time, openness, willingness to review, and provide

material to ensure this workshop is up-to-the-minute.

o And especially for not requiring any editorial control over the content or my

views expressed in this material (in reverse alphabetically order).

• Guavas

• HP (don’t mention the Autonomy deal)

• Versant, NoSQL database vendor

• Ty Wang, social media entrepreneur using FB Social Graph

• Lorien Pratt, Data / Decision Scientist with Telco focus

• Amazon Web Services

• Matrixx


18

(c) 2012 Alan Quayle Business and Service Development

Introductions

• Spend 2 minutes to introduce yourself

o Name, current employer and job

o Let us know your favorite hobby

• For me its hiking with my family

o What you want to get out of this course

• What topics are most important to you?

History and Overview Understanding Big Data and Real-Time Analytics in Context


Structure

• What do we mean by Big Data?

• Why does Big Data matter?

• Big Data Maturity

• The 3Vs” Volume, Variety and Velocity

• What are the Domains of Big Data?

• Big Data Technologies

• What Enterprises Think of Big Data

• How Enterprise Verticals are Impacted

by Big Data

• Why Now?

• Key Trends driving towards Big Data


• History of Big Data



• List of Companies in Big Data (and

their Big Data revenues)

• Big Data Market Sizing

• Telecoms and Real-Time

• O2 More: Proof we can do it!

What Do We Mean by Big Data?


IDC’s Definition of Big Data


What is Big Data


Why does Big Data Matter?





Another Version of the 3 Vs

• Volume: Data sets are expanding constantly. A strategic approach to

big data takes into account ways to store and manage the huge

volumes of data that are being generated.

• Variety: Big data comes in many forms. Analyzing multi-structured

data can yield important insights that can help direct a business

strategy.

• Velocity: The speed at which data is analyzed is everything,

especially when working in a time sensitive business environment.







What are the Domains of Big Data?


Big Data Technology Stack


Big Data Technologies


The Technology has Become Quite Fashionable







Big Data Use Cases



Companies in Big Data

• Storage: HP, EMC, IBM, Dell, NetApp, Hitachi Ltd., Fujitsu, Oracle, NEC

• Servers: IBM, HP, Dell, Oracle, Fujitsu, Acer, Cray, Groupe Bull, Hitachi, NEC, SGI, Stratus

Technologies, Unisys, Cisco, Lenovo

• Networking: Cisco, Brocade, HP, Dell, IBM, Alcatel-Lucent, F5 Networks, Citrix

• Relational database software: Oracle Exadata, IBM Netezza, IBM Smart Analytics System,

Teradata, HP Vertica and Autonomy, SAP Sybase IQ, EMC Greenplum DB and HD, Microsoft SQL

Server Parallel Edition, IBM Netezza High Capacity Appliance, Teradata Extreme Performance

Appliance, SAP-Sybase IQ

• Hadoop-based data management and analysis software: Cloudera, MapR, EMC Greenplum HD,

Oracle Big Data Appliance, IBM BigInsights, Hstreaming, Platfora, Zettaset, DataStax,

Karmashere, Datameer, Hadapt, and so forth

• XML databases: MarkLogic, Oracle XML DB, IBM pureXML, Software AG webMethods, Tamino

XML Server, TigerLogic, Xyleme, and so forth


Companies in Big Data

• Object-oriented databases: Jade Software, Objectivity, Progress Software, Versant

• Graph databases: Neo Technology, Objectivity, Franz Inc., Sones, Ravel

• Ultra-high-speed streaming data technologies: IBM InfoSphere Streams, Informatica

Ultra Messaging Streaming Edition, TIBCO FTL and BusinessEvents, Progress Software

Apama CEP

• Analytics and discovery software: SAS, IBM, Attivio, HP Autonomy, Skytree, Attivio,

Oracle Advanced Analytics, IBM SPSS, Microsoft, Vivisimo, ZyLAB, Sinequa, Revolution

Analytics, KXEN, BA Insight, Palantir, Perfect Search, Wolfram Alpha

• Decision support and automation software including applications: Webtrends, Adobe-

Omniture, IBM Coremetrics, FICO

• Services: Accenture, Deloitte, TCS, HP, Teradata, Mu Sigma, Think Big Analytics,

• Hortonworks, Hashrocket, KloudData, Trendwise Analytics


Big Data Is a Big Market & Big Business - $50 Billion Market by 2017 (according to Wikibon)

• Open source analyst firm Wikibon pegs the current Big Data market at just over $5

billion (IDC and others agree with)

• Wikibon forecast the Big Data market will grow at a CAGR of 58% between now and

2017, hitting the $50 billion within five years.

• Vendors from whales like IBM and HP to pure-plays like Vertica and Cloudera are

bringing in significant revenue today helping enterprises, governments and

healthcare organizations process and make sense of the torrents of unstructured data

flowing from mobile devices, sensors, social media and other sources.

• Today Big Data technologies like Hadoop are mostly in production at Web and online

gaming companies, large financial services firms and banks, and online retailers.


Big Data Is Big Market & Big Business - $50 Billion Market by 2017

• Another important point is that, while Hadoop may be the poster child of Big Data,

there are other important technologies at play.

o Hadoop: open source framework for distributing data processing across multiple nodes, these

include massively parallel data warehouses “that deliver fast data loading and real-time

analytic capabilities,”

o Analytic platforms and applications that allow Data Scientists and Business Analysts to

manipulate Big Data; and

o Data Visualization tools that bring insights from Big Data analysis alive for end users.

• Of the current market, Big Data pure-play vendors account for $300 million in Big

Data-related revenue.

o Despite their relatively small percentage of current overall revenue (approximately 5%), Big

Data pure-play vendors – such as Vertica, Splunk and Cloudera — are responsible for the vast

majority of new innovations and modern approaches to data management and analytics that

have emerged over the last several years and made Big Data the hottest sector in IT.


Wikibon Forecast


IDC’s Forecast






55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

Technology Review Diving into a little detail on a few of the key technologies (only as deep as the architecture) to understand their history and capabilities / limitations



• Hadoop

o What is Hadoop?

o Ecosystem

o History

o Design Axioms

o Hadoop Distributed File System

o MapReduce: Distributed Processing

o Architecture

o Data Schemas

o Query Language Flexibility

o Economics

o Case Studies

• Hadoop and Hbase in the Cloud (Amazon)

• NoSQL and Cassandra + some use cases

• Hbase versus Cassandra

• Graph Database introduction


Hbase Versus Cassandra: History

• HBase and its required supporting systems are derived from what is

known of the original Google BigTable and Google File System

designs (as known from the Google File System paper Google

published in 2003, and the BigTable paper published in 2006).

• Cassandra on the other hand is a recent open source fork of a

standalone database system initially coded by Facebook, which

while implementing the BigTable data model, uses a system inspired

by Amazon’s Dynamo for storing data (in fact much of the initial

development work on Cassandra was performed by two Dynamo

engineers recruited to Facebook from Amazon).


Hbase Versus Cassandra:

• These differing histories have resulted in HBase being more suitable for data

warehousing, and large scale data processing and analysis (for example, such as

that involved when indexing the Web)

• Cassandra being more suitable for real time transaction processing and the

serving of interactive data.

• For lightweight validation you’ll find the current makeup of the key committers

interesting:

o the primary committers to HBase work for Bing (M$ bought their search company last

year, and gave them permission to continue submitting open source code after a couple

of months).

o By contrast the primary committers on Cassandra work for Rackspace, which supports

the idea of an advanced general purpose NOSQL solution being freely available to

counter the threat of companies becoming locked in to the proprietary NOSQL solutions

offered by the likes of Google, Yahoo and Amazon EC2.


• The CAP Theorem, and was developed by Professor Eric Brewer, Co-founder and Chief Scientist of

Inktomi.

• The theorem states, that a distributed (or “shared data”) system design, can offer at most two out of three

desirable properties – Consistency, Availability and tolerance to network Partitions. Consistency means

that if someone writes a value to a database, thereafter other users will immediately be able to read the

same value back. Availability means that if some number of nodes fail in your cluster the distributed

system can remain operational, and Tolerance to Partitions means that if the nodes in your cluster are

divided into two groups that can no longer communicate by a network failure, again the system remains

operational

• If you search online posts related to HBase and Cassandra comparisons, you will regularly find the HBase

community explaining that they have chosen CP, while Cassandra has chosen AP

• BUT the CAP theorem only applies to a single distributed algorithm. But there is no reason why you

cannot design a single system where for any given operation, the underlying algorithm and thus the trade-

off achieved is selectable.

• Thus while it is true that a system may only offer two of these properties per operation, what has been

widely missed is that a system can be designed that allows a caller to choose which properties they want

when any given operation is performed.

• Not only that, reality is not nearly so black and white, and it is possible to offer differing degrees of

balance between consistency, availability and tolerance to partition. This is Cassandra. © 2012 Alan Quayle Business and Service Development 77

Application of Big Data


Structure

• Hardware and Software Trends

o Execution and Results Characteristics

o Framework: Ecosystem, Application

Services, Data Management

• Real-Time Analytics

o Use Cases

o Extended RDMS versus MapReduce /

Hadoop

o Requirements, Trends, People and

Organization Issues, Outlook

• Big Data and the Cloud

o Why the Cloud and Big Data?

o Cloud benefits

o Use Cases: Bankinter, Etsy, Razorfish


• The Social Enterprise

o Business Benefits

o ALU example

o Drivers

o Social + Data Analysis = Business

intelligence

o AT&T Case Study

o Lessons Learned

• Telcos and Big Data

o TMF Survey

o Big Data Framework

o Predictive / Adaptive Analytics

o Decision Engineering

o The Problem with Telecom

• Telco Analytics

o Customer Profiling

o Next Product Tools

o Marketing Mix Modeling

o Cost of Acquisition Tools

o Case Study

Use Cases for Big Data Analytics

• Search ranking.

o All search engines attempt to rank the relevance of a webpage to a search request against all

other possible webpages

o Google’s page rank algorithm is, of course, the poster child for this use case

• Ad tracking.

o E-commerce sites typically record an enormous river of data including every page event in

every user session

o This allows for very short turnaround of experiments in ad placement, color, size, wording,

and other features

o When an experiment shows that such a feature change in an ad results in improved click

through behavior, the change can be implemented virtually in real time

• Location and proximity tracking.

o Many use cases add precise GPS location tracking, together with frequent updates, in

operational applications, security analysis, navigation, and social media

o Precise location tracking opens the door for an enormous ocean of data about other locations

nearby the GPS measurement



• Causal factor discovery.

o Point-of-sale data has long been able to show us when the sales of a product goes sharply up

or down. But searching for the causal factors that explain these deviations has been, at best, a

guessing game or an art form.

o The answers may be found in competitive pricing data, competitive promotional data

including print and television media, weather, holidays, national events including disasters,

and virally spread opinions found in social media.

• Social CRM.

o This use case is one of the hottest new areas for marketing analysis. The Altimeter Group has

described a very useful set of key performance indicators for social CRM that include share of

voice, audience engagement, conversation reach, active advocates, advocate influence,

advocacy impact, resolution rate, resolution time, satisfaction score, topic trends, sentiment

ratio, and idea impact.

o The calculation of these KPIs involves in-depth trolling of a huge array of data sources,

especially unstructured social media.



• Document similarity testing.

o Two documents can be compared to derive a metric of similarity. There is a large body of academic

research and tested algorithms, for example latent semantic analysis, that is just now finding its way to

driving monetized insights of interest to big data practitioners.

o For example, a single source document can be used as a kind of multifaceted template to compare against a

large set of target documents. This could be used for threat discovery, sentiment analysis, and opinion

polls. For example: "find all the documents that agree with my source document on global warming.“

• Genomics analysis: e.g., commercial seed gene sequencing.

o A few months ago the cotton research community was thrilled by a genome sequencing announcement that

stated in part "The sequence will serve a critical role as the reference for future assembly of the larger

cotton crop genome.

o Cotton is the most important fiber crop worldwide and this sequence information will open the way for

more rapid breeding for higher yield, better fiber quality and adaptation to environmental stresses and for

insect and disease resistance.” Scientist Ryan Rapp stressed the importance of involving the cotton

research community in analyzing the sequence, identifying genes and gene families and determining the

future directions of research.

o This use case is just one example of a whole industry that is being formed to address genomics analysis

broadly, beyond this example of seed gene sequencing.


Use Cases for Big Data Analytics • Discovery of customer cohort groups.

o Customer cohort groups are used by many enterprises to identify common demographic trends and

behavior histories. We are all familiar with Amazon's cohort groups when they say other customers who

bought the same book as you have also bought the following books. Of course, if you can sell your product

or service to one member of a cohort group, then all the rest may be reasonable prospects. Cohort groups

are represented logically and graphically as links, and much of the analysis of cohort groups involves

specialized link analysis algorithms.

• In-flight aircraft status.

o This use case as well as the following two use cases are made possible by the introduction of sensor

technology everywhere. In the case of aircraft systems, in-flight status of hundreds of variables on engines,

fuel systems, hydraulics, and electrical systems are measured and transmitted every few milliseconds. The

value of this use case is not just the engineering telemetry data that could be analyzed at some future point

in time, but drives real-time adaptive control, fuel usage, part failure prediction, and pilot notification.

• Smart utility meters.

o It didn't take long for utility companies to figure out that a smart meter can be used for more than just the

monthly readout that produces the customer’s utility bill. By drastically cranking up the frequency of the

readouts to as much as one readout per second per meter across the entire customer landscape, many

useful analyses can be performed including dynamic load-balancing, failure response, adaptive pricing,

and longer-term strategies for incenting customers to utilize the utility more effectively (either from the

customers’ point of view or the utility's point of view!)



• Building sensors.

o Modern industrial buildings and high-rises are being fitted with thousands of small

sensors to detect temperature, humidity, vibration, and noise.

o Like the smart utility meters, collecting this data every few seconds 24 hours per day

allows many forms of analysis including energy usage, unusual problems including

security violations, component failure in air-conditioning and heating systems and

plumbing systems, and the development of construction practices and pricing strategies.

• Satellite image comparison.

o Images of the regions of the earth from satellites are captured by every pass of certain

satellites on intervals typically separated by a small number of days.

o Overlaying these images and computing the differences allows the creation of hot spot

maps showing what has changed. This analysis can identify construction, destruction,

changes due to disasters like hurricanes and earthquakes and fires, and the spread of

human encroachment.



• CAT scan comparisons.

o CAT scans are stacks of images taken as "slices" of the human body. Large

libraries of CAT scans can be analyzed to facilitate the automatic diagnosis of

medical issues and their prevalence.

• Financial account fraud detection and intervention.

o Account fraud, of course, has immediate and obvious financial impact. In

many cases fraud can be detected by patterns of account behavior, in some

cases crossing multiple financial systems. For example, "check kiting" requires

the rapid transfer of money back and forth between two separate accounts.

o Certain forms of broker fraud involve two conspiring brokers selling a security

back-and-forth at ever increasing prices, until an unsuspecting third party

enters the action by buying the security, allowing the fraudulent brokers to

quickly exit. Again, this behavior may take place across two separate

exchanges in a short period of time.


Use cases for big data analytics • Computer system hacking detection and intervention.

o System hacking in many cases involves an unusual entry mode or some other kind of behavior

that in retrospect is a smoking gun but may be hard to detect in real-time.

• Online game gesture tracking.

o Online game companies typically record every click and maneuver by every player at the most

fine grained level. This avalanche of "telemetry data" allows fraud detection, intervention for a

player who is getting consistently defeated (and therefore discouraged), offers of additional

features or game goals for players who are about to finish a game and depart, ideas for new

game features, and experiments for new features in the games.

o This can be generalized to television viewing. Your DVR box can capture remote control

keystrokes, recording events, playback events, picture-in-picture viewing, and the context of

the guide. All of this can be sent back to your provider.

• Big science including atom smashers, weather analysis, space probe telemetry feeds.

o Major scientific projects have always collected a lot of data, but now the techniques of big data

analytics are allowing broader access and much more timely access to the data. Big science

data, of course, is a mixture of all forms of data, scalar, vector, complex structures, analog wave

forms, and images.


Use Cases for Big Data Analytics • "Data bag" exploration.

o There are many situations in commercial environments and in the research

communities where large volumes of raw data are collected. One example might be data

collected about structure fires. Beyond the predictable dimensions of time, place,

primary cause of fire, and responding firefighters, there may be a wealth of

unpredictable anecdotal data that at best can be modeled as a disorderly collection of

name value pairs, such as "contributing weather= lightning.” Another example would be

the listing of all relevant financial assets for a defendant in a lawsuit.

o Again such a list is likely to be a disorderly collection of name value pairs, such as

"shared real estate ownership =condominium.” The list of examples like this is endless.

What they have in common is the need to encapsulate the disorderly collection of name

value pairs which is generally known as a "data bag.” Complex data bags may contain

both name value pairs as well as embedded sub data bags. The challenge in this use case

is to find a common way to approach the analysis of data bags when the content of the

data may need to be discovered after the data is loaded.


Use Cases for Big Data Analytics • The final two use cases are old and even predate data warehousing itself. But

new life has been breathed into these use cases because of the exciting potential

of ultra-atomic customer behavior data.

o Loan risk analysis and insurance policy underwriting. In order to evaluate the risk of a

prospective loan or a prospective insurance policy, many data sources can be brought

into play ranging from payment histories, detailed credit behavior, employment data,

and financial asset disclosures. In some cases the collateral for a loan or the insured

item may be accompanied by image data.

o Customer churn analysis. Enterprises concerned with churn want to understand the

predictive factors leading up to the loss of a customer, including that customer’s detailed

behavior as well as many external factors including the economy, life stage and other

demographics of the customer, and finally real time competitive issues.


Big Data on the Cloud

In the Real World

How the Cloud Is

Big Data’s Best Friend

Characteristics

of Big Data

Characteristics of

Big Data

Features driven by MapReduce

Big Data is Getting Bigger

2.7 Zetabytes in 2012

Over 90% will be

unstructured

Data spread across a

wide array of silos

Why is Big Data Hard (and Getting Harder)?

Changing Data Requirements

Faster response time of fresher data

Sampling is not good enough & history is

important

Increasing complexity of analytics

Users demand inexpensive experimentation

Where is it Coming From?

Computer

Generated

• Application server

logs (web sites,

games)

• Sensor data (weather,

water, smart grids)

• Images/videos

(traffic, security

cameras)

Human

Generated

• Twitter “Fire Hose”

50m tweets/day

1,400% growth per

year

• Blogs/Reviews/Emails

/Pictures

• Social Graphs:

Facebook, Linked-in,

Contacts

Big Data Verticals

Media/Advertising

Targeted Advertising

Image and Video

Processing

Oil & Gas

Seismic Analysis

Retail

Recommend

Transactions Analysis

Life Sciences

Genome Analysis

Financial Services

Monte Carlo Simulations

Risk Analysis

Security

Anti-virus

Fraud Detection

Image Recognition

Social Network/Gaming

User Demographi

cs

Usage analysis

In-game metrics

Bank – Monte Carlo Simulations

“The AWS platform was a good fit for its unlimited and flexible computational power to our risk-simulation process requirements. With AWS, we now have the power to decide how fast we want to obtain simulation results, and, more importantly, we have the ability to run simulations not possible before due to the large amount of infrastructure required.” – Castillo, Director, Bankinter

23 Hours to 20 Minutes

The Taste Test http://www.etsy.com/tastetest

Recommendations

http://www.etsy.com/tastetest

etsy.com/gifts

Recommendations

Gift Ideas for Facebook Friends

Recommendations

Targeted Ad

User recently

purchased a

sports movie and

is searching for

video games (1.7 Million per day)

Click Stream Analysis

The Social Enterprise

• Implementations are getting bigger and growing faster than ever

• Virtually all data continue to show sustained real-world benefits (McKinsey,

IBM, Frost and Sullivan, AIIM)

• Everything is becoming social: Social features are appearing in virtually all types

of applications

• There continues to be considerable confusion about who “owns” social in the

organization

• The predicted social data explosion: It happened

• Mining insight from social data has now become a major industry (#bigdata,

#analytics)

• The blur between internal and external social business has not progressed as far

as many thought

• The first serious talk about open social business standards has begun



Decision Engineering

Adaptive Analytics

Predictive Analytics

Reporting

Data Management (including data migration, data quality, data

modeling)


Adaptive Analytics


Reporting


modeling)

Will this customer churn?

Yes/No data: If customer has an open trouble ticket: Yes, otherwise: No

Real-Valued: If customer age < 30: Yes, otherwise: No

Combination: If customer age <30 AND has an open trouble ticket: Yes,

otherwise: No

Linear Combination: If 2.3 x Age + 4.4 x Income > 40: Yes, otherwise: No

Predictive Analytics: Obtain these numbers by analyzing historical data

Adaptive Analytics: Update your historical data, and re-derive the numbers

periodically to take changing situations into account.

Nonlinear Analytics:

age

Income vs.

age

Income

Pattern

Predictive/Adaptive Analytics on 1 slide


Adaptive Analytics


Reporting


modeling)

Decision Model (part of Decision Engineering)

From: Agile Decision Making: Improving business results with analytics TM Forum Quick Insight report, 2011. Source: Lorien Pratt

…Decision engineering places analytics in the larger business context. Each “f” here is an analytic, or based on human expertise

Data used to construct the

analytic

If 2.3 x Age + 4.4 x Income > 40: Yes, otherwise: No

Operational data

1

2

3

Sally is likely enough to churn that we should call her

Sally

4

5

Key Distinctions

• Automated versus human-in-the-loop while building

analytics

• Automated versus human-in-the-loop while using

analytics

• Strategic versus tactical goals

• One-size fits all versus demographic versus personalized

• Within-silo versus between-silo

• Cleansing for operational versus analytic purposes

How do I create a

responsive analytics

capability, and

governance relative to

the right-time

application of analytic

decision making?

How do I leverage and

operationalize customer

insights and experience

data to drive personal,

timely, and relevant

interactions across all

channels?

How to dynamically

manage margin and

brand perception with

the right mix of regular,

promotional and

markdown products

across categories,

channels, and formats?

Are inventory and

demand data leveraged

to optimize the customer

experience and

effectively respond to

changing marketing

conditions?

Operations

Marketing & Sales Merchandising

Supply Chain

Multi-Channel Operations

Supplier/Partner Collaboration

Moving Analytics to the Center: Retailers face new competition that is driving an advanced view of customers and interactions to the center of the business.

Advanced Customer

Intelligence

Semantic Framework: Applied Customer Analytics Capability

The New Analytical Competency

Focus of Efforts in the Past New Competency Requirements

Large-scale Integration of All Data Sources

Connected Information & Analytics Governance for the Enterprise

Central Control of Meta Data and Information Usage

Provisioning Information & Insights to Point of Leverage

Developing the Most Technically Correct Analytical Point Solution Possible

Agile Analytical Modeling Processes & Rapid Evaluation of Business Lift

Example- FROM: How can we use all possible customer dimensions to predict customer churn? TO: What is the optimum behavior modeling framework to rapidly build and deploy models applicable to multiple business objectives that change over time?


Historical Approaches Rely on

Static Data

• Propensity to Churn • Propensity to Buy • Propensity to Pay • Customer Lifetime

Value

Future Needs Require a More

Dynamic Approach

• Ability to intervene in customer interactions to create desired outcomes

Problem Statements

Telcos are not traditionally nimble

Telcos look at customers in groups, not individually.

Telcos have very little idea what drives customer behavior

Telcos have no idea how to influence customer behavior

Even if they knew how to influence customer behavior, Telcos do not have the nimble decisioning tools required to impact customer behavior in real time.

Ecosystem, Taxonomies and Supplier Review Understanding the many suppliers, technology camps, and approaches






approaches



• Cloudera

• Autonomy

• Vertica

• InfoChimps

• Guavas

• Matrix


• Case Studies


Facebook



o Goal

o Actual Analytics

o Solution



Twitter

o Requirements

o Actual Analytics

o Challenges

o Performance

o One data any API

o Solution




provides integrated solutions to enable rapid decisions on big data for CSPs

Guavus delivers big data

solutions, not just

technology components

Unique ability to rapidly fuse huge

quantities of data from

diverse sources

Patent pending streaming analytics

technology proven over

10+ years

Current customers include leading wireless,

IP, and video service

providers

Guavus at a Glance

• 3 of the top 5 NA mobile operators, 3 of the top 5 IP / MPLS backbone carriers, & CDN Networks

• 4 of the top 6 largest global communications infrastructure equipment vendors

• Mature (10+ years) patent-pending technology

Silicon Valley Venture Backed Company

Tier-1 CSP Customers & Partnerships

Industry Proven & Recognized

• US HQ in San Mateo, CA, R&D Offices in India • Raised $48 Million, 350 employees worldwide

Guavus Empowers LOB to Make Decisions

Data Collection, Fusion and Mining Across Disparate Data Sources

Information Systems

Enterprise

Apps

Data

Warehouses

Databases

Networks

Devices & Networks

Data at Rest Views

Data in Motion Flows

Finance & Regulatory

• Profitability Analysis

• Tiered Pricing Optimization

• Contract/SLA Enforcement

Network & Operations

• Traffic Engineering

• Capacity Planning

• Peering Optimization

Marketing

• Customer Segmentation

• Campaign management

Executives

• Continuous Business Optimization

• Predictive Planning

Customer Care & Sales

• Churn Prediction

• Focused Prospecting

• Targeted Up-Sell & Cross-Sell

Operator Challenges in a Big Data World

DATA SITTING IN SILOS

EXPONENTIAL [ STREAMING ] DATA GROWTH

TIMELY INSIGHTS

DISTRIBUTED NETWORK

GENERATION

Key Data Sources & Insights

Streaming Analytics Insights

Content trending & consumption

Fused network events

Subscriber dynamic usage

profiles

Network usage patterns

Policy control functions

CONTENT PROVIDERS

INTERNET CDN

EDGE NETWORK

ACCESS NETWORK

CPE OR END DEVICE

Transforming the Big Data Analytics Economic Model

• Consolidate data in a repository • Transport and store data- Transport

and storage costs alone may put it over budget

• Project may not even get started

Traditional Centralized, Store-First

Architecture

• Move processing to data edge • Focus spend on analytics first • Continuous processing yields timely and

actionable insights • Reduce overall spend per new analytics

questions • Leverage off the shelf low cost processing

and storage

Streaming Centric Distributed, Compute-First

Architecture

RESOURCES & TIME RESOURCES & TIME

TRANSORT STORAGE COMPUTE [ Insights ]

TR

AN

SP

OR

T

ST

OR

AG

E

COMPUTE [ Insights ]

Master Fusion

Machine Learning

Clustering &

Classifying

Master Aggregation

Business Logic

Centralized

Compute &

Analyze

Analytics Applications Examples

Mobility Digital Media

Broadband

3rd Party Feeds & Customer Tools

Market Research

Ad Targeting

Capacity Planning

Data Warehouse

s

Big Data Streaming Analytics Architecture

Aggregation

Data Fusion

Streaming / Batch Ingest

Distributed Site 3

Local Data Store

Aggregation

Data Fusion


Distributed Site 2

Local Data Store

Aggregation

Data Fusion


Distributed Site 1

Local Data Store

DPI Data

PDN Flows

AAA Data

Web Activity

Web Taxonomy

Advertising Traffic

Media Type Meta

Data

Flow & Routing

Service Consumption

Traffic

Data Sources

Mobility Reflex

Central Compute ( Fusion, Aggregation & Compute )

Guavus External API & POC Sandbox

Network Management, Field Inventory, etc.

Data Stores (IT, DWH, Cloud)

XDR

Traditional ETL Layers

…

Guavus Applications

Enterprise Reporting

Customer UI Portals Insight Discovery 3rd Party System Support

PM / FM CRM Inventor

y

Ingest Export

IP Reflex

CDN Reflex

Ad Reflex

Consumer Reporting

API HBASE API

SQL/Hive

Data Store

Distributed Data Collectors



Streaming Data Feeds

Gu

avu

s S

tream

Pro

cessin

g P

ipelin

e

Caching Compute Nodes ( Bus Cubes, Machine Learning Caching )

Analysis Store

Cube API

SQL

DPI PCMD IPDR NetFlow RADIUS DNS …

Guavus Analytics Platform Details

Matrixx. Parallel-MATRIXX™

• Parallel-MATRIXX™ technology has completely re-invented

transactional real-time and eliminated limitations with

contemporary technologies described earlier.

• The Next slide identifies the Parallel-MATRIXX™ functional

architecture based on multiple patented technologies, and offering a

performance improvement of at least two orders of magnitude

relative to legacy approaches.



Matrixx. Algebraic-Decision Engine

• OCS raters can be broadly classified as rule- or data driven.

o The former offer great flexibility to configure rating scenarios of arbitrary sophistication

but which can become challenging to maintain beyond a certain complexity.

o Data driven systems typically offer a rich catalog of off-the-shelf templates that are easily

configured to create real offers.

• These templates are “baked” into code so performance can be highly optimized.

The challenge with this approach arises when no suitable template is available,

often requiring complex and costly customization.

• With respect to real-time performance, both approaches share a common

weakness. Every transaction results in execution of conditional logic reflecting

the rating discriminators (if weekend, and if URL is On-net, and if…).

• As rating, or indeed policy, rules become more sophisticated, execution code

paths extend and performance degrades – often unpredictably.


Matrixx. Algebraic-Decision Engine

• The Parallel-MATRIXX™ Algebraic-Decision engine eliminates this degradation by

building on the simple principle that any pricing concept can be represented as a set

of mathematical equations.

• Modern CPUs capable of 200 million multiplications per second are exceptionally

efficient at solving such equations.

• Pricing plans, offers, and policies are configured via a GUI and transparently

compiled into an n-dimensional matrix where each dimension corresponds to a

rating normalizer (such as time, location, service, etc.).

• Stored at each matrix “intersection” is a linear equation representing the rating

formula to be applied. As each transaction is mapped to the relevant intersection,

solution of the associated linear equation is extremely fast.

• As offers are extended with additional normalizers (for example, adding a device

dependency to offer lower rates for a promoted device), the matrix dimensionality is

extended accordingly. This simply results in a few additional CPU cycles to solve the

rate equation with no significant impact on latency. © 2012 Alan Quayle Business and Service Development 128

Contention-Free In-Memory Database and Parallel-MATRIXX™ Processing

• Maintaining data and transaction integrity is a mission-critical requirement for any

database containing CSP customer or financial data. For example, an attempt to

transfer funds between two customers must complete successfully or be cleanly

aborted.

• A situation where the donor’s account is debited but some technical failure results in

the recipient not receiving the funds would leave the database in an invalid state.

• As described earlier, current real-time systems rely heavily on OLTP and locking

techniques to assure data integrity but which can lead to rapidly degrading and

unpredictable performance.

• Parallel-MATRIXX™ technology is based on an in-memory database that does not

utilize locking while still supporting full ACID-compliant transactions.

• No transaction is ever blocked from accessing or updating data while newly developed

algorithms detect and resolve transaction conflicts.



Case Studies Understanding where big data is used in practice






approaches



• Cloudera

• Autonomy

• Vertica

• InfoChimps

• Guavas

• Matrix


• Case Studies


Facebook



o Goal

o Actual Analytics

o Solution



Twitter

o Requirements

o Actual Analytics

o Challenges

o Performance

o One data any API

o Solution









Global Enterprise and Telecom Survey on Big Data and Real-Time Analytics


Structure

• Background

• The Questions

• The Importance of Analytics

• Impact of Big Data on Analytics

• Size of Data Sets, Number of Data Sources

• Update Frequency

• Integration of Data Sources

• Data Set Responsibility

• Types of Data, Types of Processing and Analytics

• Challenges

• Big Data Analytics Platforms

• Benefits and Plans

• Data Analytics Storage and IT Infrastructure Requirements

• Increasing Interest in Hadoop MapReduce Framework Technology

• Conclusions


Background

• Global Survey

• Across 200 business and IT executives, questioned in August and September

2012

• 105 enterprise (non Telco), 55 Telco – all large enterprises (no mid-market

analysis)

• Non-Telco included web service providers, financial services, healthcare,

manufacturing, retail, education, government, military, entertainment verticals

• Generally VP level with a few CxO level, all decision makers with budget

responsibilities

• Generally known to me, or through my contacts as I was trying to gather frank

reviews

• Surprisingly similar across Telco and non-Telco



31%

39%

20%

9%

1%

MostImportant

Top 5 Top 10 Top 20 NotImportant

Importance of Enhancing Data Processing and Analytics versus all

Business Priorities

Impact of Big Data on Analytics

• There is much market hype surrounding the term big data. When asked what the

term means to them, a majority of respondents indicated that it simply refers to very

large data sets, see next slide.

• The big data movement born from the Hadoop open source initiative has not reached

most IT departments or even analytics professionals, as evidenced by the fact that

only 11% of survey respondents associate Hadoop MapReduce with the concept of big

data.

• Most organizations’ analytics efforts to date have dealt with structured data, sourced

through relational databases and data warehouses, and for the vast majority of

analytical undertakings this makes sense.

• But even organizations that have not been captured by the Hadoop movement are

still increasingly under the gun to deal with larger data volumes, and the incursion of

unstructured data. This, plus the many public examples of big data that have caught

the imagination of business executives, have reinvigorated interest in data analytics.



0% 10% 20% 30% 40% 50% 60% 70% 80%

Very large data sets

Very large databases

Dat Warehouses

Data Analytics

Problems in storing / processing data

Web and search engine data

Hadoop / MapReduce

What does the term Big Data mean to you?

Size of Data Sets

• The majority (66%) of respondents revealed that the size of the

largest data set on which their organization conducts analytics is no

more than 5 terabytes (TB).

• Overall, the largest data analytics set is approximately 10 TB.

• While these numbers might not reflect the expectations that often

accompany the concept of big data, the reality is that processing

even gigabytes of data at a time during traditional analytics

exercises is significant.



5%

9%

20%

32%

19%

11%

3%

1%

<250GB <500GB <1TB <5TB <10TB <25TB <50TB >50TB

What is the Largest Data Set?

Number of Data Sources

• A significant part of data analytics exercises is the amalgamation of

data from multiple disparate sources.

• The next slide show 57% of these organizations are pulling from at

least three unique data sources, and one-quarter (25%) are

integrating data from five or more sources.



12%

21%

25%

17% 16%

9%

Single Source 2 3 4 5 >5

Number of Data Sources

Update Frequency

• Many organizations identified improving business intelligence and/or delivery of

real-time business information as a key business initiative that will have an

impact on IT spending decisions.

• Considering the volumes of data organizations intend to analyze in shorter

timeframes, organizations will need to evaluate whether their current

approaches are adaptable to these demanding and constantly changing

requirements. As part of the same spending survey, organizations also identified

major application deployments or upgrades as a top IT priority, which is

significant since every newly deployed or upgraded application will have a

corresponding impact on existing data integration processes.

• When asked about the rate with which their largest data set data is updated,

nearly two thirds (65%) of organizations revealed that the changes take place at

an either real-time or near real-time pace.



28%

37%

35%

Realtime (streams) Near realtime Batch

Frequency of Update

Integration of Data Sources

• When asked about the primary method to integrate data sources

comprising their organization’s largest data sets, nearly four fifths of

respondents identified purpose-built applications such as

Informatica, Oracle, and Teradata.

• An additional 30% use custom extract, transform, load (ETL) scripts

or custom extract, load, transform (ELT) scripts for data source

integration purposes.



39%

30%

12%

10% 9%

Purpose built Custom ETL EAI Open Source Other

Main Method of Integrating Data Sources

Data Set Responsibility

• In terms of the sources responsible for populating organizations’ largest data

sets, nearly half (51%) of respondents identified back office applications, such as

resource planning, human capital management, and accounting systems.

o For example, many years of order or payment information can yield useful insight into

customer patterns.

• Another common source involves the information gleaned from corporate data

centers and computer networks in the form of network traffic and system log

files. This information is important to not only those organizations looking to

maximize network and system performance and utilization metrics, but also to

those that rely on security analytics to help shape information privacy and

information protection strategies.

• Enterprise organizations were significantly more likely to identify internal back

and front office applications, internal data center or computer networks, e-

commerce applications (i.e., point-of-sale, supply chain, etc.), and scientific

research as data sources that comprise their largest data sets.



51%

45%

35%

34%

12%

10%

11%

10%

7%

Internal back-office

Internal data center

Front office

Web Applications

Social media

Telemetry

External public data

Third Party

Scientific research

Responsible for Populating Data Set

Types of Data

• What data types end up in organizations’ largest data sets from the

aforementioned sources? More than half (52%) of respondents indicated that

their largest data set is comprised of database data.

• Nearly half (48%) of organizations have some measure of transactional data—

such as point-of-sale (POS) or inventory—residing in their largest data set.

• What is interesting is the number of organizations that report that unstructured

data—especially machine-generated content such as log files and sensor data—

populates their largest data sets. These data types precipitated the concept of big

data and there are emerging signs that these will consume a vast amount of

bandwidth, compute, and storage resources. Probably the most significant

takeaway is that big data becomes really big when an organization starts to see

unstructured / machine-generated data grow to the size of—or even surpass—

relational information, which will serve to further exacerbate the integration

challenges mentioned above.



52%

48%

30%

22%

19%

18%

16%

11%

9%

Relational database

Transaction database

Office documents

Log files

Text / messages

Location data

Web log files

Audio / video

Sensor data

Source of Data

Challenges

• When asked to identify the data processing and/or analytics challenges

associated with their organization’s largest data set, nearly half cited security /

regulation / compliance.

• Personally identifiable information (PII) and other sensitive information is what

drives this.

• About one third of respondents identified data quality (35%) and data cleansing

tasks (33%) since data cleansing and preparation was categorized as the most

time-consuming data processing and analytics activity.

• While lack of skills is a middle of the pack challenge according to respondents.

• Clearly, responses involving process-related considerations (i.e., data security,

integration, cleansing, etc.) gravitated to the top of the challenges list



48%

35%

32%

29%

25%

19%

18%

17%

Security / Regulation / Compliance

Data quality

Cleansing

Data integration

Business expectations

Data Synchronization

Costs

Lack of Skills

Data Processing Challenges

Benefits

• Cost containment is still an important business initiative to many

organizations, especially when it comes to IT investments.

• More than half (55%) of respondents identified reduced costs as a

key benefit associated with their data analytics platform.

• Other top benefits centered on simplicity and efficiency, including

easier management and process improvements, as well as improved

business agility, which is particularly significant since business

requirements are constantly changing when it comes to data

analytics.



55%

37%

33%

32%

25%

21%

Cost reduction

Process improvement

Business agility

Better accuracy

Event monitoring

Fraud detection

Benefits from Data Analytics Platform

Conclusions and Recommendations


Recommendations to the Big Data Buyer

• Recognize the value of unified information access and analysis in supporting

fact-based decisions by individuals, groups, and systems.

• Recognize the shortcomings of operating without having the right information at

the right time. Use this awareness to help build the business case for addressing

those shortcomings – fine an anchor tenant for the project. NO ENTERPRISE

WIDE PLATFORM PROJECTS YET, LOOK TO THE CLOUD.

• Formulate a Big Data strategy that includes evaluation of decision makers‘

requirements, decision processes, existing and new technology, and availability

and quality of data. NOT TECHNOLOGY LED.

• The application of Big Data technology will fall into two primary categories:

o doing more efficiently (including at lower costs) tasks that have been done for years and doing completely new things

that were never before possible,

o Driving up long-term strategic organizational value.

o Identify opportunities to apply Big Data to both.


Recommendations to the Big Data Buyer

• Beware of the confusion and hyperbolic marketing in the Big Data

market today. WE ARE AT PEAK BS.

• IT organizations will need to consider a coordinated approach to

planning implementations - when more than one project exists.

• It is important to develop an IT infrastructure strategy that optimizes

the server, storage, and network resources. Well-developed plans for

networking support of Big Data projects should address optimizing the

network both within a Big Data domain and in the connection to

traditional enterprise infrastructure. LEGACY MATTERS.

• Consider the breadth of Big Data technologies and the functionality each

technology brings to the overall portfolio of tools for collecting,

accessing, analyzing, monitoring, and managing data. © 2012 Alan Quayle Business and Service Development 162

Recommendations to the Big Data Vendor • Revenue opportunities exist at all levels of the Big Data technology stack as well

as in services. Service is where the bulk of the growth exists.

• Articulate your value proposition by connecting technology capabilities to

business problems or opportunities. NOT TECHNOLOGY LED.

• Big Data technology is not an end in itself. NOT TECHNOLOGY LED.

• Recognize the value of Big Data to drive employee and customer decisions and

actions.

• Decide if you want to be a niche player or enter the mainstream.

o If the former, then build a network of consultants and partners to support your

technology.

o If the latter, then build a business case that assumes eventual acquisition.

• The growth in appliances, cloud, and outsourcing deals for Big Data technology

will likely mean that end users will choose new applications and services, based

less on the technology itself and more on the business value they deliver. © 2012 Alan Quayle Business and Service Development 163

Recommendations to the Big Data Vendor

• Whether the application is based on a database or is search based, and whether

the database is row based or column based, is in-memory or disk based, or uses

SQL or NoSQL technologies will become less relevant over time. Thus

technology will provide only a short-lived competitive advantage to any vendor.

• System performance, availability, security, and manageability will all matter

greatly. However, how they are achieved will be less of a point for

differentiation.

• HPC vendors have an edge in Big Data because leading-edge data-intensive

computing has been an integral part of HPC for decades.

• Most HPC Big Data work involves established methods of analyzing increasingly

large data volume related to numerical modeling and simulation.



• Vendors should tout, not hide, their HPC histories. A number of vendors

with HPC origins and strong HPC reputations have not capitalized on these

assets when attempting to address Big Data markets outside of HPC.

• It is better to position your high-end HPC experience as a strength for

meeting the presumably less-difficult, data-intensive challenges in the

mainstream market.

• Useful tools are largely lacking for very large data sets. Tools such as

Hadoop and MapReduce can effectively expedite searches through the large,

irregular data sets that characterize some of the newer Big Data problems.

• These tools can be great for retrieving and moving through complex data,

but they do not allow researchers to take the next step and pose intelligent

questions. In addition, the going gets tough when data sets cross the 100TB

threshold.



• Sophisticated tools for data integration and analysis on this scale are largely

lacking today. There are opportunities to create tools and applications for Big

Data. Vendors that create tools and applications for use at this scale can use

them as a lever to seize market leadership positions in the Big Data market.

• Not all Big Data use cases involve analytics. Analytics may be at the heart of

most Big Data opportunities in the enterprise market, but there are also

opportunities to support operational workloads and information access

applications.

• Some of the emerging technologies and the vendors behind them will likely end

up as components or features of broader information management, access, and

analysis platforms of larger vendors. Specialized application and service

providers with localized and industry expertise will be critical to expanding the

market.



Walk or Run to Big Data? It depends on your situation. For most telcos the move to Big Data will be incremental and complementary to

existing platforms and investments. Focus on the solution: the application of the Analytics to the Business – people and process not technology.