Upload
karthikinteger
View
120
Download
2
Tags:
Embed Size (px)
Citation preview
Copyright © 2012 Tata Consultancy Services Limited
INTERNAL & CONFIDENTIAL
April 7, 2023
Global Consulting Practice (GCP)Big Data – Point of ViewGCP Information Management
Document NameTCS Confidential
Why Big Data?
Emergence of Big Data Platforms
Explosion of “Big Data”
Maturation ofAnalytic Tools (Advanced A.I)
Digital Expansion
Social Explosion
Mobility/Location
Cloud Computing
Big Data
Explosion of Information plus Multiple Innovations are creating a Perfect Storm
• Social Media• Sensor Data• Video Feeds• Audio Clips• Images • News Feeds• Log Files
• Google • Amazon• Yahoo• eBay • Apple• Hadoop• Map/Reduce
• Listening• Text Mining• Machine Learning
• Automated Reasoning
• Artificial Intelligence
Document NameTCS Confidential
Digital Expansion
Social Explosion
Mobility/Location
Cloud Computing
Big Data
Big Data : Web Scale 50 billion web pages 800 million Facebook users 1000 million Facebook pages 200 million Twitter accounts 100 million tweets per day 5 billion Google queries per day Millions of servers, Petabytes of data
Varieties of Data Video / Audio Images / Pictures Diverse internal and external data
Sources of Data News / Feeds / Blogs / forums Groups / Polls / Chats / Wiki
Leveraging Big Data – The New Challenge
Information is exploding all around – But the challenge is to understand
Document NameTCS Confidential
The Net Generation is inter-connected on a variety of Web based and Digital channels.
• Facebook• Twitter• Google• Youtube• Linkedin • Wikipedia• Blogs• Forums• Groups
This is changing the rules of Customer engagement
The Net Generation is Here…
Document NameTCS Confidential
The Voice of the Customer must be heard
- 5 -
Sales and Marketing
Customer Acquisition
Customer Service
Brand Reputation
Customer Retention
Product Innovation
Higher customer satisfaction Faster implementation of service
improvements Reduced customer service
expense
Retained customers Improved customer
responsiveness and service levels
Improved customer satisfaction
Acquire new customers Grow share-of-wallet from
existing customers
Improved new product adoption rates, Increased sales Improved lead conversion rates Reduced sales and marketing expense
Identify new value added service ideas
Accelerated new product introductions
Listening to the voice of the customer (VoC) has acquired new meaning in the wake of Social Media
Leads to
Leads to Improved
Improved
Proactively manage brand risk Identify areas where damage
control is required
Document NameTCS Confidential
TCS Point of View # 1
- 6 -
POV : Big Data is here to stay and is going to be an increasingly relevant arena of competitive differentiation
Rationale : Given the information explosion going on all around, and the currentstream of innovations happening altogether, Big Data is going to be very important. Organizations that learn how to “harness” Big Data and “harvest” useful information and insight from Big Data will create competitive advantage for themselves. They will be seen by their customers as keeping up with the March of technology capabilities. Others that are not current will appear to behind the times, and therefore not competitive.
Implication : Most organizations will invest resources and time to uncover use casescenarios for Big Data in various Business Processes, and deploy Big Data platformsto harness and harvest useful insight from Big Data. While the particular sources of data that are relevant for a given Business scenario may vary from use case to use case within an organization, and from one Industry Vertical to another, the applicationof techniques for harnessing Big Data and harvesting useful insight will be nearly Universally adopted.
Document NameTCS Confidential
Big Data – The New Frontier
CRM Data
GP
S
Demand
Spe
ed
Velocity
Transactions
Opp
ortu
nitie
s
Service C
alls
Customer
Sales Orders
Inventory
Em
ails
Tweets
Planning
Things
MobileIn
stan
t Me
ssage
s
Worldwide digital content will double in 18 months, and every 18 months thereafter.
VELOCITY
In 2005, humankind created 150 exabytes
of information. In 2011, 1,200 exabytes
were be created.
VOLUME VARIETY80% of enterprise
data will be unstructured,
spanning traditional and non traditional
sources.Gartner
IDC
The Economist
Storage Ana
lytic
To
ols
Processing
Document NameTCS Confidential
Data
Big Data – Management and Interpretation
- 8 -
Internal
External
Structured
Unstructured
X
Man
ag
em
en
t S
erv
ices A
naly
tics S
erv
ices
Document NameTCS Confidential
TCS Point of View # 2
- 9 -
POV : There are two fundamental aspects to Big Data – The harnessing aspect, i.e. the Technology required to Manage Big Data, and the harvesting aspect i.e.The Technology required to analyze and derive insight from Big Data.
Rationale : Given the volume, variety, velocity characteristics of Big Data, it is not amenable to being managed by traditional technologies. It requires a new class of Big Data platforms e.g. The Hadoop ecosystem, the Map / Reduce Algorithm and technologies built on top of them, to harness Big Data. At the same time, analyzing Big Data with a view to harvesting useful nuggets of insight from a variety of Big Datasources requires completely different technologies as well. These two domainsof technologies are complementary to each other, i.e. two sides of the Big Data coin.
Implication : Both Technology domains need to be deployed for Big Data to be useful.Correspondingly the skills required to harness and manage Big Data, and the skills required for analyzing and interpreting Big Data are also necessary. However, they are generally different skills. Harnessing Big Data requires purely a technology orientation, while harvesting insights from Big Data requires a more comprehensive business context i.e. the Business problem we are trying to solve, and metrics we are trying to impact etc.
Document NameTCS Confidential
Big Data Technology is Here Now…
Big Data Technology handles data at extreme scale and is
characterized by
• Massive parallel computing to divide and conquer workloads.
• Extremely flexible to allow unlimited data manipulation and transformation
• Massively scalable in terms of both technology and cost
Hadoop : Massively Parallel Processing Capability, running on
commodity hardware
Hbase and Hadoop/HDFS are designed to store and manage massive
amounts of data
Hive, Mahout and R, enable query, analysis and running in-memory compute-intensive applications
The ecosystem of Big Data Technology is affordable, and within the reach of
companies
Document NameTCS Confidential
What Does a Big Data Platform Do?
Document NameTCS Confidential
TCS Point of View # 3
- 12 -
POV : Big Data Technology Platforms built around the Hadoop ecosystem, usingThe Map / Reduce algorithms can be used to solve many traditional problems, i.e. not involving Big Data per se.
Rationale : The Hadoop and Map/Reduce based frameworks, represent a paradigmShift in Data Processing capabilities. While they originated in the context of handlingBig Data from vendors such as Google, Yahoo, Amazon etc. they can be used to Handle many traditional Data Processing contexts as well. One example is the useOf the Hadoop Platform as an ETL Toolset working exclusively with traditional Structured, transactional and master data. Thus the Big Data Technology Platform Has use in contexts such as ETL, DWH, MDM, Analytics etc.
Implication : Organizations which are experiencing extremely high workloads, in traditional Data Warehousing and Analytics contexts, are likely to experiment with Big Data Technologies for solving traditional data processing problems. In fact, many benefits ranging from significant performance improvements, total cost of ownership, increased throughput of processing activity, improved availability of data to end users, and many others can be generated from deploying Big Data Platforms, without the incorporation Big Data sources.
Document NameTCS ConfidentialTCS Confidential
HDFS
MapReduce / Hive /Pig
Extr
act
TransformMapReduce / Hive / Pig could be used to transform data within the distributed file
system (HDFS).
Had
oop
Clus
ter
Transactional Systems
Load
Data Warehouse
Within Hadoop Ecosystem
Tools like SQOOP could be leveraged to load data from and to HDFS
Hadoop as Transformation Platform in ETL
Less number of Higher end nodes
Document NameTCS Confidential
HDFS
MapReduce / Hive /Pig
ETL
MapReduce / Hive / Pig could be used to transform data within the distributed file system (HDFS), and create the aggregates
and the same could be moved to aggregate level data marts
Had
oop
Clus
ter
Transactional Systems
Agg
rega
tes
Data Warehouse
Data Marts at Aggregate
Levels
Data-Mart on Hadoop (to store more granular data)
Tools like SQOOP could be leveraged to load data from and to HDFS
Hadoop complements Data Warehouse
Higher number of nodes for larger storage
Document NameTCS ConfidentialTCS Confidential
ET
L
Transactional Systems
Data Warehouse
Tools like SQOOP could be leveraged to load data from
and to HDFS
Hadoop as an ad-hoc analysis platform
HDFS
MapReduce / Hive /Pig
MapReduce / Hive / Pig could be used to transform data within the distributed file system (HDFS), this could provide the business analytics team a platform
for innovation
Ha
do
op
C
lust
er
Hadoop as an ad-hoc analysis platform
Higher number of nodes for larger storage
Data at lowest grain
Document NameTCS Confidential
TCS Point of View # 4
- 16 -
POV : The Big Data Technology and Product landscape is quite vast and varied right now. There are hundreds of products and offerings. Consolidation of Products and offerings will be natural over the 2-5 years.
Rationale : The basic Hadoop and Map Reduce technologies which are at the heartof all Big Data Technology Platforms are available in three forms i.e. open source, proprietary and hybrid. Open Source technologies can be deployed as they are, andmany companies are choosing to do this. However, they will have the issues of securityprivacy and robustness of management etc. Niche players are relatively new and willget consolidated in course of time. The major Technology vendors such as IBM, HP, Oracle, Teradata, Informatica etc. will complement, fill gaps and improve their offerings
Implication : It is difficult to predict, which technologies will survive, which will getacquired and consolidated and which will simply die, at this time. Companies whichare committed to the open source idea and wish to exploit this technology may investin these directly, and build skills in this area. On the other hand, companies which arecommitted to Vendors such as IBM or Teradata, etc, may weigh the costs versus benefits of going with pure open source, or buy into a hybrid strategy, where some ofthe capability gaps are filled by the Vendors. This needs careful evaluation.
Document NameTCS Confidential
No SQL
Big Data Product and Offering LandscapeAnalytics / Visualization
CEP
Search
Appliance/ Vendor
Data Integration
Data Integration
Hadoop Distributions
Tools
Cloud Distributions
Document NameTCS Confidential
Pure-Play Vendors
Document NameTCS Confidential
Big Data Product Landscape
Commercial Open Source Hybrid
Document NameTCS Confidential
TCS Point of View # 5
- 20 -
POV : Unstructured Data cannot be consumed as it is, in its raw form. It must be“processed” into useful nuggets of information i.e. converted into a consumableStructured form, before it can be interpreted and acted upon.
Rationale : Unstructured information cannot be interpreted and used by end users, as it is. It must be converted into a useful form. This requires filtering a lot of noise out ofthe data, since Big Data tends to have a lot of noise relative to useful data. Further the information content of Big Data streams, must be interpreted in the context of other more traditional types of information, before it can be deemed useful. This requires the “Fusion” of Big Data based information with more traditional structured information to derive useful insight.
Implication : Big Data is not a new opportunity or capability that stands on its own. Itis better considered as augmenting already existing Data Management and Analyticscapabilities in an organization. Big Data platforms are not replacements for existing traditional Data Management and Analytics platforms. They merely add, mature and improve upon existing environments and capabilities. The information fusion i.e. the ability to bring together structured and unstructured information in the context of specific business problems and opportunities is what is needed to exploit Big Data.
Document NameTCS Confidential
An Example - Social Intelligence
- 21 -
Social Intelligence i.e. the process of generating useful knowledge from the web of social media activity is maturing :
However the social Web is too big, moving too fast and too full of irrelevant data trash.
Listening
Filtering
Fusion
Analysis
Dashboards
Radian 6
Visible Technologies
Synethesio Converseon
Attensity SDL
Networked Insights
Lithium
Friends
FansFollowers Influencers
Network
Value
Document NameTCS Confidential
Listen & Learn – Machine Learning
Listen
Learn, Focus,Filter,
Reason
NewsChatterEvents
Fuse,Connect
AnalyzeAlertRespond
Document NameTCS Confidential
Real TimeStreams
Unstructured Data (HDFS)
Real Time StructuredDatabase
Big SQLNo SQL
Processing
Analytics
Integrated Customer Insights environment
Real-Time Business Insights and Alerts
Early-Problem Detection
Market Intelligence
Demand Signal Refinement
Marketing
EIF Framework
This requires Information Fusion
Document NameTCS Confidential
Enterprise Information Fusion (EIF)
Structured Information
Unstructured Information
Document NameTCS Confidential
Traditional Channels
Smartphones
SocialCall Center
Web WebsiteIntranetPartner PortalsSEOSEMOnline AdvertisingWeb presenceMicro-sitesecommerce
Mobile Applications Mobile App Stores Mobile Web Mobile Messaging Location-based services
Social Network ApplicationsSocial Search Engine OptimizationCommunity managementSocial Media ExpansionSocial Business InitiativesCrowd sourcing
Partner PortalsMobile
Big Data
RFID, Monitors and Sensors
Tablets
Marketing Customer Service Sales Product
DevelopmentPublic
RelationsHuman
Resources Finance
Big Data – requires connecting the dots…
Document NameTCS Confidential
Marketing Customer Service Sales Product
DevelopmentPublic
RelationsHuman
Resources Finance
Big Data Big Insights
Smartphones
Web WebsiteIntranetPartner PortalsSEOSEMOnline AdvertisingWeb presenceMicro-sitesecommerce
Mobile ApplicationsMobile App StoresMobile WebMobile MessagingLocation-based services
Partner PortalsMobileTablets
Traditional Channels
Social Call Center
Social Network ApplicationsSocial Search Engine OptimizationCommunity managementSocial Media ExpansionSocial Business InitiativesCrowd sourcing
RFID, Monitors and Sensors
In order to generate useful Insights
The new Technology Challenge – Harnessing the power of Big Insights
Document NameTCS Confidential
TCS Point of View # 6
- 27 -
POV : The Fusion of Unstructured and Structured Information for a given Business context, requires Business domain expertise in addition to Data Analysis Expertise. This is a new science i.e. Data Science
Rationale : While Information Fusion is a general expertise, its application is usuallywithin the confines of a specific Business context. Examples of specific businesscontexts are Marketing, Sales, Brand Management, Customer Service, Fraud and Riskanalytics etc. Within each Business context, the information sources that arerelevant, and the process of extracting useful insights from Big Data, are unique and distinct. This requires knowledge and understanding of Data sources and the processes for deriving useful information from Big Data in business contexts.
Implication : Data Science, and the role of a Data Scientist is going to be a new areaof growth and development. The traditional Analyst who was equipped with managingand analyzing structured data is going to have to extend themselves to understandand work with non-traditional Big Data sources, and tools appropriate to working withthem. There is likely to be a tremendous demand for Data Scientists in the future. Itis possible that many universities and colleges may offer courses on Data Science and the Tools required to work with big Data.
Document NameTCS Confidential
Big DataSocial Channels
Blogs, Wikis, Forums Social networking Groups User profiles Ratings, reviews, etc. Polls, chat, podcasting Audio, video, photos Events & calendar Private messaging+
Instrumented Channels Smart grid Home appliances Cars Sensors Monitors Supply chain devices Other mobile devices
Mobile Channels Mobile Applications
Other Channels Video Audio Other
Future Direction Description
Business Analytics Business intelligence combines with advanced analytics to form a new category called business analytics
Social Data Social data will play a greater role in decision processes
Analytic Applications The emergence of applications that bundle, data, knowledge, and analytics to solve business problems
The Awareness-to-Action Imperative
Analytics will increasingly identify market signals and initiate action, through context sensitive alerts
Analytic Centers of Excellence The growing enterprise realization that Analytic COE’s are required
Analytic Outsourcing McKinsey Global Institute predicts a future shortage of analysts and managers with the necessary analytical skills
Text Analytics Maturation Text Analytics is absorbed into business applications
Process Enablement The shift from analytics as a reporter of process, to analytics as an enabler of process
The Information Lifecycle The growing role of analytics throughout the information life cycle
Data Science and Advanced Analytics
Analytics is evolving to meet the needs of the market. Leaders can expect:
Document NameTCS Confidential
Analytics Classifications
Social AnalyticsSentiment AnalysisBrand IdentityProduct & Brand AffinityReputation Driven Online-Economy
Text Analytics
ForecastingTargetingFraud Detection, Anti-Fraud AnalyticsRegression, Predictive, MultivariatePropensityPrice Elasticity
Predictive Analytics
Customer Segmentation in real-timeChurn Analysis, AttritionFunnel AnalysisBehavioral Segmentations
Segmentation Analytics
Digital Delivery Channels & ServicesProperty EffectivenessApplication AnalyticsAd AnalyticsGeo-Spatial AnalyticsUser profile and RelevanceIdentify New Opportunities
Mobile Analytics
Document NameTCS Confidential
Big Data Analytics
Prescriptive (What should happen?)
Descriptive (What has Happened?)
Predictive (What will happen?)
Optimizing Outcomes
Identifying possible outcomes Domain Expertise Text Analytics Data Mining Knowledge
Predictive Modeling Statistical Analysis Visual Analytics Forecasting
Describing and analyzing outcomes
Query, Analysis, Drill-Down, Ad-Hoc Reporting Dashboards and Scorecards Visual Analytics
Optimization Simulation
* Source – GCP Business Analytics
Document NameTCS Confidential
Examples of Uses of Big Data
31
• Log Analytics & Storage
• Smart Grid / Smarter Utilities
• RFID Tracking & Analytics
• Fraud / Risk Management & Modeling
• 360° View of the Customer
• Warehouse Extension
• Email / Call Center Transcript Analysis
• Call Detail Record Analysis
• +++
Document NameTCS Confidential
Some Examples of Use Cases
Financial Trade
Monitoring
Telco Call Data Record
Management
Website Analytics
Fraud Detection
Online Gaming
Micro Transactions
Digital ad Exchange
Services
Wireless Location-based
Services
Data SourceHigh-Frequency
OperationsLow-Frequency
Operations
Document NameTCS Confidential
Applications for Big Data Analytics
Homeland Security
Finance Smarter Healthcare Multi-channel sales
Telecom
Manufacturing
Traffic Control
Trading Analytics Fraud and Risk
Log Analysis
Search Quality
Retail: Churn, NBO
Document NameTCS Confidential
TCS Point of View # 7
- 34 -
POV : We are still in the very early days of Big Data adoption. The companiesThat have deployed and exploited Big Data technologies are Google, Yahoo, Amazon etc. The rest are just beginning their Big Data Journey.
Rationale : Big Data Technologies have been used exclusively so far in companies that are dealing with Web Scale data. This technology is now slowly beginning to become viable for large commercial enterprises. Use cases which represent possible scenarios where Big Data can be fruitfully exploited, are still being discoveredand documented. Very few case studies are available which represent full scale adoption of Big Data technologies. We are still in an era of experimentation, trial anderror, do and learn, Proof of concept and Value cycles.
Implication : Big Data adoption will increase steadily over the next few years. Gartner is predicting that we are still in the early “Technology Trigger” phase of Big Data. IDC and Wikibon are predicting a ten-fold growth in the Big Data Market over the next five years. Most companies will do well to set aside budgets for experimentation andlaboratory scale projects to explore the uses of Big Data in various business contextsand in the process develop some skills in these new technologies and Data Scienceareas.
Document NameTCS Confidential
The Gartner Hype Cycle
Document NameTCS Confidential
What is the Market?
Document NameTCS Confidential
Business Drivers for Big Data
Copyright © 2012 Tata Consultancy Services Limited
TCS Confidential
7 April 2023
Thank You
Big data analytics will push businesses to become smarter, social, more relevant