8
www.infosys.com Companies from across sectors are experiencing exponential growth in data, thanks to new content generated by social interactions, rich media and a variety of devices. This vast amount of digital data is getting created through emails, instant messaging, surveys, videos, images, RFID tags, web text, blogs, geo-location devices, collaboration platforms like Twitter and Facebook and so many other sources. This data, when combined with in-house legacy data, is a potential goldmine of opportunity for organizations of all types. Insights Smarter Big Data Strategies - Girish Khanzode

Smarter Big Data Strategies

  • Upload
    infosys

  • View
    880

  • Download
    2

Embed Size (px)

DESCRIPTION

Companies from across sectors are experiencing exponential growth in data as social interactions, rich media and a variety of devices generate new content. A tidal wave... of digital data is getting created through emails, instant messaging, survey videos, images, RFID tags, web text, blogs, geo-location devices, collaboration platforms like Twitter and Facebook, and so many other sources.

Citation preview

Page 1: Smarter Big Data Strategies

www.infosys.com

Companies from across sectors are experiencing exponential growth in data, thanks to new content generated by social interactions, rich media and a variety of devices. This vast amount of digital data is getting created through emails, instant messaging, surveys, videos, images, RFID tags, web text, blogs, geo-location devices, collaboration platforms like Twitter and Facebook and so many other sources. This data, when combined with in-house legacy data, is a potential goldmine of opportunity for organizations of all types.

Insi

ghts

Smarter Big Data Strategies

- Girish Khanzode

Page 2: Smarter Big Data Strategies

2 | Infosys

The scientific and creative analysis of this large complex data in real time can generate deeper insights offering 360-degree perspectives around customer sentiment and behavior. Companies can respond to market trends dynamically, improve operational efficiencies and gain significant competitive advantage. Smarter analytics, machine learning and intelligent algorithms can help discover new patterns that can result in identification of more patterns, and replace intuitive management decision-making with the one driven by facts. With proper data analysis in place, many companies are now able to answer questions that were never asked before.

Clearly, successful Big Data management can radically transform organizations. A retailer using Big Data can improve operating margin by more than half; US healthcare can save more than $300 billion every year; consumer goods and service companies can create fine-grained customer segments in real time, in order to improve the precision of targeting promotions and advertising; healthcare companies can discover new treatments faster; and investors can predict stock market events with higher accuracy.

Page 3: Smarter Big Data Strategies

Infosys | 3

Challenges of Big Data InitiativesAlthough Big Data has its promise, it has its perils too. Companies trying to leverage it can face significant challenges. Studies indicate that more than 80% of Fortune 500 organizations will fail to take advantage of Big Data by 2015. Failure on this front represents serious risks to a company and can disrupt its business. On the contrary, smarter execution can propel the organization into a new trajectory of growth by attracting more customers, improving sales margins, introducing newer products and services much faster, and achieving higher satisfaction levels and loyalty of existing customers.

The size, speed, complexity and diversity of Big Data can push the capabilities of traditional data management technologies towards an extreme and in most cases also cause them to fail. This challenge is further compounded by the need to manage data in the context and in real time. The collection, storage, processing, analysis and visualization of this data can overwhelm existing IT infrastructure. Timeliness, privacy and shortage of relevant skillsets are other impediments in implementation.

Companies could look at the following 10 practical strategies to successfully leverage Big Data:

Page 4: Smarter Big Data Strategies

4 | Infosys

1

Top Business Needs as Primary Drivers

Because data storage costs are falling continuously, companies have a tendency to store excessive data for future use. However, they need to avoid this practice since the costs of data collection, storage, and

analysis - considering the rapid velocity of data growth - can quickly rise significantly.

There is a risk of readily available or easily acquirable data becoming the driver of Big Data strategy. Instead the strategy should be driven by the data’s potential to add value by solving major pain points or

yielding healthy return on investments, for instance.

The first step should be to identify a set of key questions targeted at areas that the company wants to grow. For example, an online business might be focused on fine grained customer segmentation, cross-selling,

strengthening multi-channel reach or improving its recommendation intelligence, whereas a manufacturing unit might be looking at improving product designs. These are the type of questions that should drive Big Data

implementation strategies. These questions should further get mapped to a clear set of business requirements, which are critical to identify timelines and resource needs (like skillset). The implementation process should be

iterative in nature to ensure that it meets the needs of continuously evolving key questions, enables the collection of the right data and helps garner the intended insights.

3

Optimizing Storage Needs

In theory, more data means better analysis; however, we live in a real world with limitations where many challenges such as the cost of data storage, manipulation and computational power rise with volume.

Big Data, helped by the tendency of data to proliferate quickly, can force traditional data platforms to scale beyond the levels they are not designed for. Beyond petabytes of datasets, current warehouse

infrastructures become uneconomical.

It is important to note that more data does not automatically mean higher accuracy and in some cases, may even introduce noise that can obscure weaker patterns. It also increases the risk of false discoveries

resulting in insights that will not yield positive results.

Growth in computing processing power and drop in memory-capture prices is making it possible to build data on the fly and process it in-memory. This strategy reduces the need for very large storage capacity. When

storing Big Data, it is also important to remember that replication systems can introduce security vulnerabilities and RAID at petabyte scale can lead to data loss.

For efficient processing, data must be split and stored in different segments based on its value, sensitivity and costs involved. The most valuable data should be housed inside the corporate data warehouses, less valuable

data on cheaper commodity storage like Cloud, and the rest should be put within analytical tools. When results are desired, all of this data can be pulled together dynamically and analysis can be performed on-the-fly. Metadata

needs special attention since it is growing at twice the rate of other data. Companies, especially those in the starting phases of the Big Data drive, should set up a clear set of rules and guidelines detailing which data should

be retained, archived and how long.

2

Criticality of the Right Skillset

Big Data projects require different skillsets compared to traditional IT needs. Companies require data scientists, managers and engineers with expertise in multiple domains like computers, business

operations, machine learning, statistics, analytics, advanced mathematics and visualization tools.

Data scientists should be able to - formulate models and perform data mining, spot patterns and associations, and create appropriate logic that can process data into business decisions.

Data managers should be conversant with business operations and capable of - asking the right questions for generating business insights, mapping results to formulate business strategy and creating recommendations.

Data engineers should be able to - design, develop and maintain applications in Big Data environments, program visualization tools and dashboards, and maintain the infrastructure to perform analytics.

Equally important is creativity and the ability to leverage data to improve business growth.

Since Big Data is a recent phenomenon, there is a shortage of trained and qualified data professionals. This shortage will continue to exist in the next few years considering the huge demand. Lack of suitable talent will

prove be a major hindrance to Big Data strategy implementation. Supplementing hiring with appropriate training and redeployment of existing staff can mitigate some part of this risk.

Page 5: Smarter Big Data Strategies

Infosys | 5

4

Scaling the Infrastructure

Due to greater variety and volume, the acquisition of Big Data needs infrastructure capable of supporting flexible data structures, very high transaction volumes and the ability to process queries

in a distributed environment, along with delivering predictable latency after a query is fired.

While network performance is critical, communication paths increase significantly with the number of nodes in a cluster. The transfer of a larger dataset requires higher networking bandwidths and WAN

optimization technologies.

A multimode cluster using HDFS can create high levels of traffic across the network since Hadoop spreads the data across the member servers of the cluster. Direct attached storages (DAS) can help create islands of

information that can be processed by analytics applications but impair data and resource sharing with other servers. While SANs offer better throughput and scalability, local storage is cheaper and performs better overall.

Storage appliances designed for Hadoop and Big Data analytics are another option.

A decision on a Big Data storage solution must take into account space requirements, data growth, frequency of analytics execution and type of data processed. All these factors coupled with security, allocated budget and

processing time should drive Big Data investments.

5

Ensuring Data Quality

While collecting Big Data, a significant amount of garbage can creep in. Poor quality data can result in faulty analysis especially when finding outliers. With massive amounts of data getting generated

from machines and sensors, the potential for pollution in data goes up exponentially driven by factors like transmission errors, incorrect device calibrations, inaccurate device measurement methods or poor

device performance under peak loads. Stringent quality control and inspection mechanisms along with good data governance are critical to reduce data ‘obesity’ and derive insights that are correct

Data typically becomes less valuable to the business as it ages. Conservation policies of data, based on timelines, can play a significant role in preserving data quality in analysis. In addition, hygiene techniques like

quality maintenance, profiling, standardization, ensuring consistency and integration, along with rules-driven testing should be part of the Big Data strategy.

6

Maximizing User Adoption

At the end of the day, the success of Big Data initiatives will be measured by the widespread usage of analytic applications by business users. It will depend on their ability to easily create data sets

that fit their needs, and their ability to feed these to analytical tools developed as part of the Big Data initiative, without the help of corporate IT, to build insights in real time.

Growth and maturity in Cloud and appliances, coupled with the arrival of newer analytics tools, is resulting in users focusing more on business value than underlying technologies. Important qualities like system

performance, scalability, availability, user experience and manageability will be critical to the adoption of Big Data applications within the organization. In addition, making these applications accessible from

multiple device types will improve user adoption significantly considering the trend of bring- your-own-device gaining traction.

Page 6: Smarter Big Data Strategies

6 | Infosys

7

Importance of Data Access

Big Data’s value creation potential depends on users’ ability to seamlessly access data for analysis. Typically, data like customer records, resides across multiple departments, geographies or silos thereby

creating obstacles to its sharing and aggregation. This is a problem when companies want to integrate external data acquired from third parties with their own corporate data pool to create insights. This lack

of a centralized customer focused view can hinder the organization’s ability to exploit Big Data. An effective enterprise data access strategy must include interoperable data models, transactional data architectures,

interoperability standards, analytical architecture, security and compliance.

9

Building Appropriate Technology Ecosystem

The management of Big Data typically involves predictive analysis, natural language processing, image analysis or advanced statistical techniques such as discrete choice modeling and mathematical optimizations. This requires technologies that are quite different from the traditional ones.

Big Data solutions should focus on processing data in a manner that avoids costly movement of large volumes of data, apart from the need to handle very high data flow rate and a large variety of formats.

Apache Hadoop is utilized to deliver analytics solutions in distributed and massively parallel environments running on a cluster of commodity hardware to filter and capture high-velocity incoming streams while

keeping the data on the original data storage clusters; and providing fault tolerance and scalability. The Hadoop Distributed File System (HDFS) is commonly deployed for distributed storage of Big Data.

NoSQL databases trade off integrity guarantees with high scalability and are well suited for dynamic data structures involving heterogeneous data. These database systems can capture all data without categorizing

and parsing, which is useful in the collection and storage of data like social media.

Generally NoSQL solutions are required to combine with SQL solutions in order to meet the manageability and security requirements of enterprises. Custom MapReduce programs are required for parallel execution on the distributed data nodes. A tool like Apache Giraph is better suited to fulfill specialized needs like social graph analysis,

because it can extract insight from complicated social relationships for customer marketing and retention campaigns.

However, deriving insights using these new technologies requires significant programming efforts and skills to interpret the storage logic used and perform analysis. Specialized needs can create new challenges such as the

lack of support for complex query patterns in case of NoSQL databases. Further complications could arise from the distributed nature of processing along with the demand for results in real-time with context considerations.

The Big Data strategy must pay careful attention to all these aspects while zeroing in on Big Data products and solutions along with other important factors like their interoperability and standards.

8

Faster Response to Market Conditions

Successful Big Data processing is dependent on rapidity of data acquisition and its analysis. Big Data systems should be quickly adaptable to changing market realities on the ground and not constrained

by traditional long application development cycles that can run for many months and beyond.

Big Data comes in several forms, such as device / sensor and scientific information, bar codes, vehicle telematics, surgery videos, stock market trades, x-rays, telephonic conversations, contracts, advertisements,

spreadsheets, audit trails and so on.

As the type or a source of data changes, it should be easier to adapt implementation to this new data and such changes should be delivered in shorter duration of two-three month cycles. Overall, the whole philosophy

of analytic solution implementations should be driven by the fact that Big Data will continue to evolve in all aspects and Big Data applications must be able to respond in the shortest possible time to reap the rewards and

keep the analysis relevant.

Page 7: Smarter Big Data Strategies

Infosys | 7

10

Avoiding Security and Privacy Pitfalls

Big Data is breaking traditional barriers of flow with large amounts of data getting digitized and traveling across boundaries. This can create issues for data portability, security, privacy, compliance,

intellectual property and liability. With more data getting stored on external Cloud as it is an inexpensive alternative, concerns around security and privacy issues are gaining larger proportions.

Since Big Data involves processing customer information, organizations should ensure confidentiality of personally identifiable and sensitive data. Data protection policies and tools like data masking must be

used to protect personal and corporate sensitive data to avoid costly consequences like loss of customer and stakeholder faith, brand erosion, liabilities and fines. Data privacy laws differ across countries and Big

Data processing efforts should ensure that these privacy regulations are adhered to.

Big Data analytics is getting so advanced that sometimes it can create insights that the customer is not aware of. Companies must be careful while issuing personalized recommendations based on analytics of

vast amount of individual data they possess, because in some cases it can make customers uncomfortable.

Organizations should make sensitive data accessible on “need to know” basis and ensure adequate data security. Companies should deploy tools and technologies like multifactor authentication, VPNs, intranet

firewalls, biometric systems and threat monitoring suites in order to protect valuable data assets. Recent studies indicate that security breaches cost companies $204 per compromised customer record. Since data can quickly

proliferate or combine easily with other data, and can be used by multiple persons, it is necessary to institute policies addressing intellectual property issues and liabilities to safeguard the organization.

About the Author

Girish Khanzode Products & Platforms Innovator for Futuristic Technologies, Infosys

Girish is a veteran in Enterprise Software Product design and development with more than 20 years of professional experience. He has built and led large product engineering teams to deliver highly complex products in multiple domains, covering entire product life cycle. Currently, he is engaged in innovating and building the next generation products and platforms in emerging new technology areas like Enterprise Data Security and Privacy, Collaboration technologies, Digital Workplace, Social Analytics, Smart Cities, Big Data and Internet of Things. Girish holds M. Tech. degree in Computer Engineering and a bachelor’s degree in Electrical Engineering.

Page 8: Smarter Big Data Strategies

© 2012 Infosys Limited, Bangalore, India. Infosys believes the information in this publication is accurate as of its publication date; such information is subject to change without notice. Infosys acknowledges the proprietary rights of the trademarks and product names of other companies mentioned in this document.

About Infosys

Infosys partners with global enterprises to drive their innovation-led growth. That's why Forbes ranked Infosys 19 among the top 100 most innovative companies. As a leading provider of next-generation consulting, technology and outsourcing solutions, Infosys helps clients in more than 30 countries realize their goals. Visit www.infosys.com and see how Infosys (NASDAQ: INFY), with its 150,000+ people, is Building Tomorrow's Enterprise® today.

For more information, contact [email protected] www.infosys.com