Upload
sonia87
View
9
Download
0
Embed Size (px)
DESCRIPTION
ICAI's Book on Big data for Chartered Accountants
Citation preview
Big Data and its Analytics A Challenge or Boon for Governance
Ravikumar Ramachandran
My Profile
CISA, CISM, CGEIT, CRISC, SSCP, CAP, CISSP-ISSAP, CFE, CIA, CRMA, PMP, CEH, ECSA, CHFI, FCMA
COBIT 5 (F), ISO 27001:2013 Lead Auditor
More than 22 years Industry experience
Last 12 years as CRO, CISO
Research and Review Committee ISACA
e-journal editor of Mumbai Chapter & CGEIT Coordinator
Presently in Hewlett-Packard
References
Big Data Big Analytics Michael Minelli, Michele Chambers, Ambiga Dhiraj
Big Data Analytics-Turning big data into big money-Frank J. Ohlhorst Big Data Now-Current perspectives from OReilly Media Privacy and Big Data-Terence Craig & Mary E. Ludloff Ethics of Big Data-Kord Davis with Doug Patterson A Revolution that will transform How we live, Work and Think Big
Data-Viktor Mayer-Schonberger and Kenneth Cukier Big Data: The next frontier for innovation, competition and
productivity-McKinsey Global Institute-June 2011 Big Data Analytics: From Strategic Planning to Enterprise Integration
with Tools, Techniques, NoSQL, and Graph David Loshin
Disclaimer & Authors Note
The views expressed belongs to the author and not that of the employer or any of the Professional Associations
This Presentation is meant for the members of the Institute of Chartered Accountants of India
The Author is sharing his own independent views and whenever references have been made to other works, due credit is given to the respective authors
Seizing the future..
As for the future, your task is not to foresee it, but to enable it -French aviator and author Antoine de Saint-Exupery
What is Big Data
Extremely large data sets
Unmanageable by database software tools
Relative and not an absolute figure
Increase with technology advances
Varies with Sector
What is Big Data
Every two days now we create as much information as we did from the dawn of civilization up until 2003. Thats something like five exabytes of data-Former Google CEO Erik Schmidt
What is Big Data
1000 Bits = 1 Kilobyte
1000 Kilobytes = 1 Megabyte
1000 Megabytes = 1 Gigabyte
1000 Gigabytes = 1 Terabyte
1000 Terabytes = 1 Petabyte
1000 Petabytes = 1 Exabyte
1000 Exabytes = 1 Zettabyte
..Yottabyte..BrontobyteGEOPBYTE!!
Human Brain (Scientific American)
Storage Capacity -2.5 Petabytes ( or 1 million gigabyte)
Capacity to hold 3 million hours of TV shows
TV to run for more than 300 years!!
Internet-Worlds largest library
Estimated at Yottabytes as on date
11 trillion years using the fastest internet connectivity
Estimated at 5 lakh TB in 2003
In 10 years. Expanded 20 lakh times!!
Internet-Worlds largest library
The Internet emphasizes the depth of our ignorance because our knowledge can only be finite, while our ignorance must necessarily be infinite-Sir Karl Popper, Conjectures and Refutation: The Growth of Scientific knowledge (2002)
IDCs Digital Universe Study
Between 2009 and 2020, digital data will grow 44-fold to 35 zettabytes per year
IDC s Prediction
Volume of Digital Content:
2012 -2.7 billion terrabytes ( 48% more than 2011)
2015 -8 billion terrabytes
Digital content doubles every 18 months
Economist
Humans created 150 exabytes of information in the year 2005
In 2011-more than 1200 exabytes!!
Gartner s prediction
More than 90% of universal data have been created in the last two years
About 80% of enterprise data will be in the form of unstructured data
The arrival of Analytics
Big Data-Big Opportunity
NASA, National Oceanic and Atmospheric Administration
Pharmaceutical companies, energy companies
Big Data & Todays business
Dimensions of Big Data
Volume : Whole and sample size
Variety : Structured and unstructured
Structured : Any data capable of being entered in a data field.
Unstructured : Audio, Video, image, geospatial, click streams and log files
Dimensions of Big Data
Velocity : The speed at which the data is created, accumulated, ingested and processed
Real-time decision making
Big Data Synergies
Traditional Business Intelligence
Data Mining
Statistical applications
Predictive analysis
Data Modeling
Getting the Big of Big Data
Transformation Capabilities
Big Data is too big an opportunity
Best Integration
Storage Technologies
Open Source
Hadoop-its suitability
Limitations-Pre-requisites, hardware requirements
Business Takeaway
Business cannot wait to take decision for the completed and structured data
It needs to take decision on unstructured data
However not all unstructured data is useful
Business Houses ignoring unstructured data are doomed
Factors enabling Big Data
Internet and digitization of opinions & behaviour
Mobile computing
Social Networking
Moores Law & Cloud
Key factors driving Big Data-1
Increasing data volumes being captured and stored
2011 IDC Digital Universe Study- In 2011, the amount of information created and replicated will surpass 1.8 zettabytesgrowing by a factor of 9 in just 5 years
The scale of this growth surpasses traditional technologies and configuration setups
Key factors driving Big Data-2
Rapid acceleration of data growth
2012 IDC Digital Universe study, From 2005 to 2020, the digital universe will grow by a factor of 300, from 130 exabytes to 40000 exabytes
From now, until 2020, the digital universe will double about every two years
Key factors driving Big Data-3
Increased data volumes pushed into the network
According to CISCOs annual Visual Networking Index Forecast, By 2016, annual global IP traffic is forecasted to be 1.3 zettabytes
Due to increasing number of smartphones, tablets and other internet devices
Increased bandwidth and proliferation of Wi-fi availability
Key factors driving Big Data-4
Growing variation in types of data assets for analysis
Data scientists take advantage of unstructured datasets as against structured datasets
Acquired from a wide variety of sources
Format can be that of text, images, audio and video content
Existing structured data management needs to enhanced to accommodate the above
Key factors driving Big Data-5
Alternate and unsynchronized methods for facilitating data delivery
Structured environment gives clear methods of data delivery and exchange
File transfers through tape and disk storage systems
Unstructured data coming from twitter, Government websites
Pressure for rapid acquisition, absorption and analysis
Key factors driving Big Data-6
Rising demand for real-time integration of analytical results
Increasing number of consumers for analytical results
Business required real-time results of consumer behaviour
Data Explosion
Data doubles itself in every two years
Malthusian Theory of Population
Author of book Essay on the Principles of Population (1798)
Food production increases in A.P (25 years)
Population growth increases in G.P (25 years)
Restraint on reproduction
Malthusian Theory of Data Explosion (Imaginary)
Population growth increases in G.P (25 years)
Data explodes every 2 years ( 1024 times app)
Do not use mobile devices
Restraint on internet
Do not go to social sites
Reproduction is allowed
But no DATA Reproduction!!
All economists to become Data Scientists
Evolution of Big Data
Farnam Jahanian-Assistant Director for computer and information science and engineering for National Science foundation(NSF) defines data a transformative new currency for science, engineering, education and commerce
Evolution of Big Data
Big Data is characterized not only by the enormous volume of data but also by the diversity and heterogeneity of the data and the velocity of its generation
Implications of Big Data-Farnam
Creation of new products and services
Accelerate the pace of discovery in every science and engineering discipline
Solve the nations challenges-medicine to cyber security
Data Explosion & Knowledge Management
Data multiplies every two years
Proprietary knowledge gets diluted
IP & Inventions
2%
IP & Inventions
1%
Going Forward
Chief Innovation officer (CIO)!!
Chief Discovery officer ( CDO)!!
Balance Sheet
Financial Management
Management Accounting
Strategic Financial Management
Financial Risk Management.so on
exciting new disciplines follows.
Big Data Technology
Hadoop
Open source software framework for processing huge datasets on a distributed system
Development was inspired by Googles Map Reduce and Google File system
Allows you to question on structured and unstructured data
Hadoop
Store any kind of data in its native format
Stores petabytes of data inexpensively
Assurance of availability
Runs on a cluster of servers each having its own CPU and disk storage
Components of Hadoop
Hadoop Distributed File System (HDFS) Storage system for Hadoop cluster
HDFS breaks the data into pieces
Distributes among the servers in the cluster
Each server stores a small segment of the data set
Each piece of data is replicated on more than one server
Components of Hadoop
Map Reduce
Each server does its part of analytical job
Reports the results for collation into a comprehensive answer
Map Reduce is the agent that distributes the work and collects the results
Hadoop
HDFS continually monitors the data stored in the cluster
In case of hardware or software failure, it takes the data from the known good replica
Map Reduce monitors the progress of each server
In case of server slowing down or failing to return an answer.
Hadoop
MapReduce automatically starts another instance of the task in the server having copy
HDFS & MapReduce joins to do a super fast & reliable job
Hadoop Users
As of early 2013, Facebook was recognized as having the largest Hadoop cluster in the world
Other prominent users
Yahoo
IBM
New Approach of Data processing
Data needs to be stored in a system in which hardware is infinitely scalable
Storage and network cannot be a bottleneck
Data must be processed into BI where it is
Move the code to the data and not other way
Data sits in one place and never move it around
Challenges in Protection of Big Data
Big Data Risk of permanent loss
Data from monitoring devices
Surveillance cameras
In frequency and in real time
Uniqueness- No deduplication
Large files- Huge CPU processing power
No good Back up solution available
Challenges in Protection of Big Data
Not handled well by RDBMS
Nosql new DBMS evolution
HIPAA & PCI compliance challenge
Very risky in medical industry
SQL/NoSQL
SQL Databases
Predefined Scheme
Standard Definition and Interface language
Tight consistency
Well defined semantics
SQL/NoSQL
NoSQL Database
No predefined scheme
Per-product definition and interface language
Getting an answer quickly is more important than getting an correct answer
Challenges in Protection of Big Data
CIA Triad- Focus on Access Control
Balance with performance
High levels of encryption
Complex security technology
Additional security layers
Liability
Way forward.
Destroy data if not legally required (logs)
Classify data
Protection measures
Control access on Need to Know
Secure the Data at rest
Keep the cryptographic keys on a separate hardened server
Ensure that security does not impede performance
Pick the right encryption scheme
Flexible security solution with changing requirements
Big Data & IP
Inventions, literary and artistic works
Symbols, images designs
What to protect
Prioritize protection
Labeling and locking
Security awareness
Holistic approach
Governance Measures
Strategic Alignment
Identify Business priorities
Define problems to be solved
Time frame
Measurable and achievable outcomes
Strategic alignment
Demonstration of Value: Whether these technologies add value to real business problems
Operationalization : How to migrate the big data projects into the production environment in a controlled and managed way
Governance Measures
Management Sponsorship
Management support for fact-based decision making
Identify champions for consumption of analytics
Ensure benefits realization from various reports and statistical models
Integration of Big Data Analytics
Standard processes for soliciting input from business users
Clear evaluation criteria for acceptability and adoption
Massive data scalability
Data reuse
Oversight and Governance
Mainstreaming accepted technologies
Governance Measures
Analytical Human Capital
Mobilize resources for analytics
Hire the right talent and retain them
Increasing demand for analysts skilled in mathematics, business and technology
Key Governance Role
Ensure business effectively uses analytics to make better decisions
Ensure investment is made in right type of analytics
Ensure investment happens in right type of people, process & technology
Data Governance
Alert : Identify data issues that might have negative business impact
Triage : Prioritize those issues in relation to corresponding business value drivers
Remediate : Data owners to take proper actions when alerted to the existence of those issues
McKinsey study
Approximately 1,40,000 to 1,80,000 unfilled positions of data analytic experts in U.S by 2018
Shortage of 1.5 million managers and analysts who have the ability to understand and make decisions using Big Data
Rise of Data Scientist
New designation
The Data Scientist
Yesterdays skills
Business + Mathematics = Consulting profession
Usage of heuristics and persuasive arguments in the board roon
Yesterdays skills
Business + Technology = IT Profession
Automate algorithmic Tasks improving productivity and efficiency
Yesterdays skills
Mathematics + Technology = Software Development
Address a wide range of business problems
Tomorrow's Skills
Business + Mathematics + Technology +Behavioral Science = Decision Science
Tomorrows Skills (Big Data, Big Analytics Michael Minelli et al)
Privacy Landscape-Businesses
Increased need to leverage privacy information for competitive advantage
Huge investment in data sources and data analytics
Privacy Landscape-Criminals
Rise in Identity theft
Sophisticated technology to exploit data security vulnerabilities
Privacy Landscape-Consumers
Increased awareness and concern about
Collection
Use
Disclosure of personal information
Privacy Landscape-Legislators
Responding to consumer concern by restricting use of PI
Significant impact and restriction for business
Seven Global Privacy Principles
Notice : Inform individuals the purpose for which information is collected
Choice : Offer individuals the opportunity to choose or opt-out
Consent : Only disclose information to third parties consistent with the above principles
Security : Take responsibility for CIA of PI
Seven Global Privacy Principles-Contd
Data Integrity : Assure the reliability of PI
Access : Provide access to individuals to PI about them
Accountability : A firm must be accountable for following principles-compliance mechanism
Other Regulations
HIPAA
GLB
FTC
Different approach
Privacy may be wrong focus
Data privacy is the thing you do to keep from getting sued, data ethics is the thing you do to make your relationship with your customers positive-James Stogdill, OReilly Radar
James Powell, CTO, Thomson Reuters, 2011, OReilly Strata Data Conference
Conclusion
Availability of Big Data
Low Cost Hardware
New Information Management and Analytic software
Enormous opportunity
Efficiency, productivity, profitability
Concluding Remarks
There are known knowns, there are known unknowns, but there are also unknown unknowns-Former U.S. Secretary of Defense, Donald Rumsfeld
Concluding Remarks..
I love that quoteWhen I think about these three things in our daily life, they fall into these three outcomes for me.. The known unknowns more fall into the category of analysis throwingthe thing I love is the last part, if you could figure this thing out, we could have saved Afghanistan from big problems Googles Avinash Kaushik in his presentation at Strata 2012, A Big Data Imperative, Driving Big Action
Thanks for your precious time!
Ravikumar