38
Copyright © 2012 Tata Consultancy Services Limited INTERNAL & CONFIDENTIAL 10/26/22 Global Consulting Practice (GCP) Big Data – Point of View GCP Information Management

Big Data - GCP IM Point of View

Embed Size (px)

Citation preview

Page 1: Big Data - GCP IM Point of View

Copyright © 2012 Tata Consultancy Services Limited

INTERNAL & CONFIDENTIAL

April 7, 2023

Global Consulting Practice (GCP)Big Data – Point of ViewGCP Information Management

Page 2: Big Data - GCP IM Point of View

Document NameTCS Confidential

Why Big Data?

Emergence of Big Data Platforms

Explosion of “Big Data”

Maturation ofAnalytic Tools (Advanced A.I)

Digital Expansion

Social Explosion

Mobility/Location

Cloud Computing

Big Data

Explosion of Information plus Multiple Innovations are creating a Perfect Storm

• Social Media• Sensor Data• Video Feeds• Audio Clips• Images • News Feeds• Log Files

• Google • Amazon• Yahoo• eBay • Apple• Hadoop• Map/Reduce

• Listening• Text Mining• Machine Learning

• Automated Reasoning

• Artificial Intelligence

Page 3: Big Data - GCP IM Point of View

Document NameTCS Confidential

Digital Expansion

Social Explosion

Mobility/Location

Cloud Computing

Big Data

Big Data : Web Scale 50 billion web pages 800 million Facebook users 1000 million Facebook pages 200 million Twitter accounts 100 million tweets per day 5 billion Google queries per day Millions of servers, Petabytes of data

Varieties of Data Video / Audio Images / Pictures Diverse internal and external data

Sources of Data News / Feeds / Blogs / forums Groups / Polls / Chats / Wiki

Leveraging Big Data – The New Challenge

Information is exploding all around – But the challenge is to understand

Page 4: Big Data - GCP IM Point of View

Document NameTCS Confidential

The Net Generation is inter-connected on a variety of Web based and Digital channels.

• Facebook• Twitter• Google• Youtube• Linkedin • Wikipedia• Blogs• Forums• Groups

This is changing the rules of Customer engagement

The Net Generation is Here…

Page 5: Big Data - GCP IM Point of View

Document NameTCS Confidential

The Voice of the Customer must be heard

- 5 -

Sales and Marketing

Customer Acquisition

Customer Service

Brand Reputation

Customer Retention

Product Innovation

Higher customer satisfaction Faster implementation of service

improvements Reduced customer service

expense

Retained customers Improved customer

responsiveness and service levels

Improved customer satisfaction

Acquire new customers Grow share-of-wallet from

existing customers

Improved new product adoption rates, Increased sales Improved lead conversion rates Reduced sales and marketing expense

Identify new value added service ideas

Accelerated new product introductions

Listening to the voice of the customer (VoC) has acquired new meaning in the wake of Social Media

Leads to

Leads to Improved

Improved

Proactively manage brand risk Identify areas where damage

control is required

Page 6: Big Data - GCP IM Point of View

Document NameTCS Confidential

TCS Point of View # 1

- 6 -

POV : Big Data is here to stay and is going to be an increasingly relevant arena of competitive differentiation

Rationale : Given the information explosion going on all around, and the currentstream of innovations happening altogether, Big Data is going to be very important. Organizations that learn how to “harness” Big Data and “harvest” useful information and insight from Big Data will create competitive advantage for themselves. They will be seen by their customers as keeping up with the March of technology capabilities. Others that are not current will appear to behind the times, and therefore not competitive.

Implication : Most organizations will invest resources and time to uncover use casescenarios for Big Data in various Business Processes, and deploy Big Data platformsto harness and harvest useful insight from Big Data. While the particular sources of data that are relevant for a given Business scenario may vary from use case to use case within an organization, and from one Industry Vertical to another, the applicationof techniques for harnessing Big Data and harvesting useful insight will be nearly Universally adopted.

Page 7: Big Data - GCP IM Point of View

Document NameTCS Confidential

Big Data – The New Frontier

CRM Data

GP

S

Demand

Spe

ed

Velocity

Transactions

Opp

ortu

nitie

s

Service C

alls

Customer

Sales Orders

Inventory

Em

ails

Tweets

Planning

Things

MobileIn

stan

t Me

ssage

s

Worldwide digital content will double in 18 months, and every 18 months thereafter.

VELOCITY

In 2005, humankind created 150 exabytes

of information. In 2011, 1,200 exabytes

were be created.

VOLUME VARIETY80% of enterprise

data will be unstructured,

spanning traditional and non traditional

sources.Gartner

IDC

The Economist

Storage Ana

lytic

To

ols

Processing

Page 8: Big Data - GCP IM Point of View

Document NameTCS Confidential

Data

Big Data – Management and Interpretation

- 8 -

Internal

External

Structured

Unstructured

X

Man

ag

em

en

t S

erv

ices A

naly

tics S

erv

ices

Page 9: Big Data - GCP IM Point of View

Document NameTCS Confidential

TCS Point of View # 2

- 9 -

POV : There are two fundamental aspects to Big Data – The harnessing aspect, i.e. the Technology required to Manage Big Data, and the harvesting aspect i.e.The Technology required to analyze and derive insight from Big Data.

Rationale : Given the volume, variety, velocity characteristics of Big Data, it is not amenable to being managed by traditional technologies. It requires a new class of Big Data platforms e.g. The Hadoop ecosystem, the Map / Reduce Algorithm and technologies built on top of them, to harness Big Data. At the same time, analyzing Big Data with a view to harvesting useful nuggets of insight from a variety of Big Datasources requires completely different technologies as well. These two domainsof technologies are complementary to each other, i.e. two sides of the Big Data coin.

Implication : Both Technology domains need to be deployed for Big Data to be useful.Correspondingly the skills required to harness and manage Big Data, and the skills required for analyzing and interpreting Big Data are also necessary. However, they are generally different skills. Harnessing Big Data requires purely a technology orientation, while harvesting insights from Big Data requires a more comprehensive business context i.e. the Business problem we are trying to solve, and metrics we are trying to impact etc.

Page 10: Big Data - GCP IM Point of View

Document NameTCS Confidential

Big Data Technology is Here Now…

Big Data Technology handles data at extreme scale and is

characterized by

• Massive parallel computing to divide and conquer workloads.

• Extremely flexible to allow unlimited data manipulation and transformation

• Massively scalable in terms of both technology and cost

Hadoop : Massively Parallel Processing Capability, running on

commodity hardware

Hbase and Hadoop/HDFS are designed to store and manage massive

amounts of data

Hive, Mahout and R, enable query, analysis and running in-memory compute-intensive applications

The ecosystem of Big Data Technology is affordable, and within the reach of

companies

Page 11: Big Data - GCP IM Point of View

Document NameTCS Confidential

What Does a Big Data Platform Do?

Page 12: Big Data - GCP IM Point of View

Document NameTCS Confidential

TCS Point of View # 3

- 12 -

POV : Big Data Technology Platforms built around the Hadoop ecosystem, usingThe Map / Reduce algorithms can be used to solve many traditional problems, i.e. not involving Big Data per se.

Rationale : The Hadoop and Map/Reduce based frameworks, represent a paradigmShift in Data Processing capabilities. While they originated in the context of handlingBig Data from vendors such as Google, Yahoo, Amazon etc. they can be used to Handle many traditional Data Processing contexts as well. One example is the useOf the Hadoop Platform as an ETL Toolset working exclusively with traditional Structured, transactional and master data. Thus the Big Data Technology Platform Has use in contexts such as ETL, DWH, MDM, Analytics etc.

Implication : Organizations which are experiencing extremely high workloads, in traditional Data Warehousing and Analytics contexts, are likely to experiment with Big Data Technologies for solving traditional data processing problems. In fact, many benefits ranging from significant performance improvements, total cost of ownership, increased throughput of processing activity, improved availability of data to end users, and many others can be generated from deploying Big Data Platforms, without the incorporation Big Data sources.

Page 13: Big Data - GCP IM Point of View

Document NameTCS ConfidentialTCS Confidential

HDFS

MapReduce / Hive /Pig

Extr

act

TransformMapReduce / Hive / Pig could be used to transform data within the distributed file

system (HDFS).

Had

oop

Clus

ter

Transactional Systems

Load

Data Warehouse

Within Hadoop Ecosystem

Tools like SQOOP could be leveraged to load data from and to HDFS

Hadoop as Transformation Platform in ETL

Less number of Higher end nodes

Page 14: Big Data - GCP IM Point of View

Document NameTCS Confidential

HDFS

MapReduce / Hive /Pig

ETL

MapReduce / Hive / Pig could be used to transform data within the distributed file system (HDFS), and create the aggregates

and the same could be moved to aggregate level data marts

Had

oop

Clus

ter

Transactional Systems

Agg

rega

tes

Data Warehouse

Data Marts at Aggregate

Levels

Data-Mart on Hadoop (to store more granular data)

Tools like SQOOP could be leveraged to load data from and to HDFS

Hadoop complements Data Warehouse

Higher number of nodes for larger storage

Page 15: Big Data - GCP IM Point of View

Document NameTCS ConfidentialTCS Confidential

ET

L

Transactional Systems

Data Warehouse

Tools like SQOOP could be leveraged to load data from

and to HDFS

Hadoop as an ad-hoc analysis platform

HDFS

MapReduce / Hive /Pig

MapReduce / Hive / Pig could be used to transform data within the distributed file system (HDFS), this could provide the business analytics team a platform

for innovation

Ha

do

op

C

lust

er

Hadoop as an ad-hoc analysis platform

Higher number of nodes for larger storage

Data at lowest grain

Page 16: Big Data - GCP IM Point of View

Document NameTCS Confidential

TCS Point of View # 4

- 16 -

POV : The Big Data Technology and Product landscape is quite vast and varied right now. There are hundreds of products and offerings. Consolidation of Products and offerings will be natural over the 2-5 years.

Rationale : The basic Hadoop and Map Reduce technologies which are at the heartof all Big Data Technology Platforms are available in three forms i.e. open source, proprietary and hybrid. Open Source technologies can be deployed as they are, andmany companies are choosing to do this. However, they will have the issues of securityprivacy and robustness of management etc. Niche players are relatively new and willget consolidated in course of time. The major Technology vendors such as IBM, HP, Oracle, Teradata, Informatica etc. will complement, fill gaps and improve their offerings

Implication : It is difficult to predict, which technologies will survive, which will getacquired and consolidated and which will simply die, at this time. Companies whichare committed to the open source idea and wish to exploit this technology may investin these directly, and build skills in this area. On the other hand, companies which arecommitted to Vendors such as IBM or Teradata, etc, may weigh the costs versus benefits of going with pure open source, or buy into a hybrid strategy, where some ofthe capability gaps are filled by the Vendors. This needs careful evaluation.

Page 17: Big Data - GCP IM Point of View

Document NameTCS Confidential

No SQL

Big Data Product and Offering LandscapeAnalytics / Visualization

CEP

Search

Appliance/ Vendor

Data Integration

Data Integration

Hadoop Distributions

Tools

Cloud Distributions

Page 18: Big Data - GCP IM Point of View

Document NameTCS Confidential

Pure-Play Vendors

Page 19: Big Data - GCP IM Point of View

Document NameTCS Confidential

Big Data Product Landscape

Commercial Open Source Hybrid

Page 20: Big Data - GCP IM Point of View

Document NameTCS Confidential

TCS Point of View # 5

- 20 -

POV : Unstructured Data cannot be consumed as it is, in its raw form. It must be“processed” into useful nuggets of information i.e. converted into a consumableStructured form, before it can be interpreted and acted upon.

Rationale : Unstructured information cannot be interpreted and used by end users, as it is. It must be converted into a useful form. This requires filtering a lot of noise out ofthe data, since Big Data tends to have a lot of noise relative to useful data. Further the information content of Big Data streams, must be interpreted in the context of other more traditional types of information, before it can be deemed useful. This requires the “Fusion” of Big Data based information with more traditional structured information to derive useful insight.

Implication : Big Data is not a new opportunity or capability that stands on its own. Itis better considered as augmenting already existing Data Management and Analyticscapabilities in an organization. Big Data platforms are not replacements for existing traditional Data Management and Analytics platforms. They merely add, mature and improve upon existing environments and capabilities. The information fusion i.e. the ability to bring together structured and unstructured information in the context of specific business problems and opportunities is what is needed to exploit Big Data.

Page 21: Big Data - GCP IM Point of View

Document NameTCS Confidential

An Example - Social Intelligence

- 21 -

Social Intelligence i.e. the process of generating useful knowledge from the web of social media activity is maturing :

However the social Web is too big, moving too fast and too full of irrelevant data trash.

Listening

Filtering

Fusion

Analysis

Dashboards

Radian 6

Visible Technologies

Synethesio Converseon

Attensity SDL

Networked Insights

Lithium

Friends

FansFollowers Influencers

Network

Value

Page 22: Big Data - GCP IM Point of View

Document NameTCS Confidential

Listen & Learn – Machine Learning

Listen

Learn, Focus,Filter,

Reason

NewsChatterEvents

Fuse,Connect

AnalyzeAlertRespond

Page 23: Big Data - GCP IM Point of View

Document NameTCS Confidential

Real TimeStreams

Unstructured Data (HDFS)

Real Time StructuredDatabase

Big SQLNo SQL

Processing

Analytics

Integrated Customer Insights environment

Real-Time Business Insights and Alerts

Early-Problem Detection

Market Intelligence

Demand Signal Refinement

Marketing

EIF Framework

This requires Information Fusion

Page 24: Big Data - GCP IM Point of View

Document NameTCS Confidential

Enterprise Information Fusion (EIF)

Structured Information

Unstructured Information

Page 25: Big Data - GCP IM Point of View

Document NameTCS Confidential

Traditional Channels

Smartphones

SocialCall Center

Web WebsiteIntranetPartner PortalsSEOSEMOnline AdvertisingWeb presenceMicro-sitesecommerce

Mobile Applications Mobile App Stores Mobile Web Mobile Messaging Location-based services

Social Network ApplicationsSocial Search Engine OptimizationCommunity managementSocial Media ExpansionSocial Business InitiativesCrowd sourcing

Partner PortalsMobile

Big Data

RFID, Monitors and Sensors

Tablets

Marketing Customer Service Sales Product

DevelopmentPublic

RelationsHuman

Resources Finance

Big Data – requires connecting the dots…

Page 26: Big Data - GCP IM Point of View

Document NameTCS Confidential

Marketing Customer Service Sales Product

DevelopmentPublic

RelationsHuman

Resources Finance

Big Data Big Insights

Smartphones

Web WebsiteIntranetPartner PortalsSEOSEMOnline AdvertisingWeb presenceMicro-sitesecommerce

Mobile ApplicationsMobile App StoresMobile WebMobile MessagingLocation-based services

Partner PortalsMobileTablets

Traditional Channels

Social Call Center

Social Network ApplicationsSocial Search Engine OptimizationCommunity managementSocial Media ExpansionSocial Business InitiativesCrowd sourcing

RFID, Monitors and Sensors

In order to generate useful Insights

The new Technology Challenge – Harnessing the power of Big Insights

Page 27: Big Data - GCP IM Point of View

Document NameTCS Confidential

TCS Point of View # 6

- 27 -

POV : The Fusion of Unstructured and Structured Information for a given Business context, requires Business domain expertise in addition to Data Analysis Expertise. This is a new science i.e. Data Science

Rationale : While Information Fusion is a general expertise, its application is usuallywithin the confines of a specific Business context. Examples of specific businesscontexts are Marketing, Sales, Brand Management, Customer Service, Fraud and Riskanalytics etc. Within each Business context, the information sources that arerelevant, and the process of extracting useful insights from Big Data, are unique and distinct. This requires knowledge and understanding of Data sources and the processes for deriving useful information from Big Data in business contexts.

Implication : Data Science, and the role of a Data Scientist is going to be a new areaof growth and development. The traditional Analyst who was equipped with managingand analyzing structured data is going to have to extend themselves to understandand work with non-traditional Big Data sources, and tools appropriate to working withthem. There is likely to be a tremendous demand for Data Scientists in the future. Itis possible that many universities and colleges may offer courses on Data Science and the Tools required to work with big Data.

Page 28: Big Data - GCP IM Point of View

Document NameTCS Confidential

Big DataSocial Channels

Blogs, Wikis, Forums Social networking Groups User profiles Ratings, reviews, etc. Polls, chat, podcasting Audio, video, photos Events & calendar Private messaging+

Instrumented Channels Smart grid Home appliances Cars Sensors Monitors Supply chain devices Other mobile devices

Mobile Channels Mobile Applications

Other Channels Video Audio Other

Future Direction Description

Business Analytics Business intelligence combines with advanced analytics to form a new category called business analytics

Social Data Social data will play a greater role in decision processes

Analytic Applications The emergence of applications that bundle, data, knowledge, and analytics to solve business problems

The Awareness-to-Action Imperative

Analytics will increasingly identify market signals and initiate action, through context sensitive alerts

Analytic Centers of Excellence The growing enterprise realization that Analytic COE’s are required

Analytic Outsourcing McKinsey Global Institute predicts a future shortage of analysts and managers with the necessary analytical skills

Text Analytics Maturation Text Analytics is absorbed into business applications

Process Enablement The shift from analytics as a reporter of process, to analytics as an enabler of process

The Information Lifecycle The growing role of analytics throughout the information life cycle

Data Science and Advanced Analytics

Analytics is evolving to meet the needs of the market. Leaders can expect:

Page 29: Big Data - GCP IM Point of View

Document NameTCS Confidential

Analytics Classifications

Social AnalyticsSentiment AnalysisBrand IdentityProduct & Brand AffinityReputation Driven Online-Economy

Text Analytics

ForecastingTargetingFraud Detection, Anti-Fraud AnalyticsRegression, Predictive, MultivariatePropensityPrice Elasticity

Predictive Analytics

Customer Segmentation in real-timeChurn Analysis, AttritionFunnel AnalysisBehavioral Segmentations

Segmentation Analytics

Digital Delivery Channels & ServicesProperty EffectivenessApplication AnalyticsAd AnalyticsGeo-Spatial AnalyticsUser profile and RelevanceIdentify New Opportunities

Mobile Analytics

Page 30: Big Data - GCP IM Point of View

Document NameTCS Confidential

Big Data Analytics

Prescriptive (What should happen?)

Descriptive (What has Happened?)

Predictive (What will happen?)

Optimizing Outcomes

Identifying possible outcomes Domain Expertise Text Analytics Data Mining Knowledge

Predictive Modeling Statistical Analysis Visual Analytics Forecasting

Describing and analyzing outcomes

Query, Analysis, Drill-Down, Ad-Hoc Reporting Dashboards and Scorecards Visual Analytics

Optimization Simulation

* Source – GCP Business Analytics

Page 31: Big Data - GCP IM Point of View

Document NameTCS Confidential

Examples of Uses of Big Data

31

• Log Analytics & Storage

• Smart Grid / Smarter Utilities

• RFID Tracking & Analytics

• Fraud / Risk Management & Modeling

• 360° View of the Customer

• Warehouse Extension

• Email / Call Center Transcript Analysis

• Call Detail Record Analysis

• +++

Page 32: Big Data - GCP IM Point of View

Document NameTCS Confidential

Some Examples of Use Cases

Financial Trade

Monitoring

Telco Call Data Record

Management

Website Analytics

Fraud Detection

Online Gaming

Micro Transactions

Digital ad Exchange

Services

Wireless Location-based

Services

Data SourceHigh-Frequency

OperationsLow-Frequency

Operations

Page 33: Big Data - GCP IM Point of View

Document NameTCS Confidential

Applications for Big Data Analytics

Homeland Security

Finance Smarter Healthcare Multi-channel sales

Telecom

Manufacturing

Traffic Control

Trading Analytics Fraud and Risk

Log Analysis

Search Quality

Retail: Churn, NBO

Page 34: Big Data - GCP IM Point of View

Document NameTCS Confidential

TCS Point of View # 7

- 34 -

POV : We are still in the very early days of Big Data adoption. The companiesThat have deployed and exploited Big Data technologies are Google, Yahoo, Amazon etc. The rest are just beginning their Big Data Journey.

Rationale : Big Data Technologies have been used exclusively so far in companies that are dealing with Web Scale data. This technology is now slowly beginning to become viable for large commercial enterprises. Use cases which represent possible scenarios where Big Data can be fruitfully exploited, are still being discoveredand documented. Very few case studies are available which represent full scale adoption of Big Data technologies. We are still in an era of experimentation, trial anderror, do and learn, Proof of concept and Value cycles.

Implication : Big Data adoption will increase steadily over the next few years. Gartner is predicting that we are still in the early “Technology Trigger” phase of Big Data. IDC and Wikibon are predicting a ten-fold growth in the Big Data Market over the next five years. Most companies will do well to set aside budgets for experimentation andlaboratory scale projects to explore the uses of Big Data in various business contextsand in the process develop some skills in these new technologies and Data Scienceareas.

Page 35: Big Data - GCP IM Point of View

Document NameTCS Confidential

The Gartner Hype Cycle

Page 36: Big Data - GCP IM Point of View

Document NameTCS Confidential

What is the Market?

Page 37: Big Data - GCP IM Point of View

Document NameTCS Confidential

Business Drivers for Big Data

Page 38: Big Data - GCP IM Point of View

Copyright © 2012 Tata Consultancy Services Limited

TCS Confidential

7 April 2023

Thank You

Big data analytics will push businesses to become smarter, social, more relevant