6
Data Science: How the ‘Marriage’ of Technology and Math Can Produce Actionable Results By blending the elegant science of pattern recognition with the discipline of mathematics, organizations can solve some of the world’s most analytically complex business problems, such as geospatial analysis. Executive Summary Although data science has gone mainstream, many onlookers see it as a fancy word for plain old statistics. But, in fact, it’s a way for companies to apply analytics through the use of a technology platform to operate and strategize business more intelligently. Data science is about deriving insights by viewing data through a mathematical lens. It’s not about math or about technology. It’s about finding patterns in the data that can be incredibly useful in a wide variety of business situations. Big data analytics has transcended buzzword status because of its ability to unlock patterns and insights that reside in a variety of data formats and structures. But as we know all too well, the sheer volume of data – not really “big” anymore, but now truly “colossal” – makes it difficult if not impossible to derive understanding from informa- tion without sophisticated computing powers and savvy algorithms. Regardless of the industry, be it financial services, retailing, manufacturing, etc., or even internal disciplines such as operations, product development, marketing, etc., data science can be brought to bear on numerous use cases, and actionable insights can be discovered one by one. This white paper examines one application of data science, geospatial analysis. We’ll explore how it works from a technical standpoint in part, because this approach demonstrates well how the “marriage” of technology and math can be combined to solve related business objectives. The application of data science in a geospatial context will hopefully illuminate how pattern rec- ognition, when combined with mathematics, can be applied in various use cases, across industries, in ways that can arm leaders with insights and foresights that make for better-informed business decisions. Cognizant Reports cognizant reports | june 2016 Cognizant 20-20 Insights

Data Science: How the 'Marriage' of Technology and Math Can Produce Actionable Results

Embed Size (px)

Citation preview

Page 1: Data Science: How the 'Marriage' of Technology and Math Can Produce Actionable Results

Data Science: How the ‘Marriage’ of Technology and Math Can Produce Actionable ResultsBy blending the elegant science of pattern recognition with the discipline of mathematics, organizations can solve some of the world’s most analytically complex business problems, such as geospatial analysis.

Executive SummaryAlthough data science has gone mainstream, many onlookers see it as a fancy word for plain old statistics. But, in fact, it’s a way for companies to apply analytics through the use of a technology platform to operate and strategize business more intelligently.

Data science is about deriving insights by viewing data through a mathematical lens. It’s not about math or about technology. It’s about finding patterns in the data that can be incredibly useful in a wide variety of business situations.

Big data analytics has transcended buzzword status because of its ability to unlock patterns and insights that reside in a variety of data formats and structures. But as we know all too well, the sheer volume of data – not really “big” anymore, but now truly “colossal” – makes it difficult if not impossible to derive understanding from informa-tion without sophisticated computing powers and savvy algorithms. Regardless of the industry, be

it financial services, retailing, manufacturing, etc., or even internal disciplines such as operations, product development, marketing, etc., data science can be brought to bear on numerous use cases, and actionable insights can be discovered one by one.

This white paper examines one application of data science, geospatial analysis. We’ll explore how it works from a technical standpoint in part, because this approach demonstrates well how the “marriage” of technology and math can be combined to solve related business objectives.

The application of data science in a geospatial context will hopefully illuminate how pattern rec-ognition, when combined with mathematics, can be applied in various use cases, across industries, in ways that can arm leaders with insights and foresights that make for better-informed business decisions.

• CognizantReports

cognizant reports | june 2016

• Cognizant 20-20 Insights

Page 2: Data Science: How the 'Marriage' of Technology and Math Can Produce Actionable Results

cognizant 20-20 insights 2

Geospatial Analysis: Getting Well ‘Grounded’At its essence, geospatial analysis is all about determining the relative position of things on the earth’s surface, and then deriving under-standing from this knowledge. It’s easy to see how many different types of organizations can use geospatial analysis technology. For example, knowing where particular populations live and work can help banks tailor products and refine market emphasis. The policing and security industries not only can determine rates of crime in particular areas, but actually predict future incidents and rates. Marketers can identify patterns of product needs and usage, and likes and dislikes, and better manage campaigns and offerings as a result. Dynamic insurance pricing is becoming more mainstream with an under-standing of areas of risk and safety, health and illnesses, longevity and mortality.

Customer behavioral analysis, based on geo-loca-tion, can inform a variety of business decisions. Geospatial analysis – just one example of data science in action – is an essential ingredient in understanding customers, including their needs and wants, patterns of product consumption, etc.

Over the last few decades, numerous technolo-gies have emerged to help make geospatial analysis possible. These include mobile phone triangulation (enabled by cell towers), a technology prominent before the emergence of global positioning systems (GPS). GPS, of course, was made possible by the use of satellites, and can locate an object or person (via a GPS receiver, or smartphone) within as

little as three meters on the Earth’s surface. There’s automatic identification systems (AIS), used by ships at sea, as well as tracking via radio frequency identification (RFID) and wi-fi. As these technologies launched and became mainstream, they also became less expensive, allowing for the identification of geographic locations of objects (“spatial data”) to become pervasive in our daily lives.

Figure 1 demonstrates, via a pair of maps, how customer position can be represented by geospatial data, derived simply from the devices people carry around with them. With the use of wearable technologies, smartphones and tablets, and the ubiquity of wi-fi and cellular interconnec-tivity, data gathered by geospatial targeting is growing at an exponential rate.

Solving the Challenge of HugenessGeospatial analysis is immensely promising, but as with everything there are challenges. The practical use of geospatial data is typically hampered by the sparse nature of data collection strategies, the sheer volume of the data and various technical issues associated with the use of the data. The enormous volume of data can easily overwhelm analysis; in fact, the huge growth of data volume has become the number one challenge for geospatial analysis.

In response, data scientists are using modern mathematical techniques to improve analytical performance, which can be sluggish when immense data sets are accessed and processed. This is particularly important when doing geospatial analysis because any piece of

Source: Google Maps Figure 1

Charting Customer Locations via Geospatial Analysis

Page 3: Data Science: How the 'Marriage' of Technology and Math Can Produce Actionable Results

ground – city, county, state, country, continent – contains almost countless points of information.

One such technique is to split the data into smaller pieces, so it becomes easier to design and apply algorithms to various components. This process is known as “tessellation,” which cuts up all the various location data points into smaller regions – in this context, these smaller regions are called “tiles” – so that the algorithm can be practically applied to smaller chunks of data. This enables analysts to treat each smaller group with a single algorithm, and compare each one with others more easily.

Taking a Closer look

The idea of tessellation used in geospatial analysis is intriguing, not just because it’s fasci-nating science, but also because of the practical problem it solves – how to gain understanding from immense sets of data. This requires a deep dive into the process, exploring the math and technology behind this analysis. The process can get very practical, very fast.

As noted, tessellation takes a geographic area to be analyzed – which can be envisioned as a flat plane –and divides areas into geometric shapes (tiles), with no overlaps or gaps to ensure nothing is missed. Data points mapped in the same tile are considered as one cluster, which forms the basic unit for algorithms or visualization techniques.

Tessellations, in fact, were used by Sumerians 6,000 years ago to create wall decorations formed by patterns of clay tiles. In the 17th century German mathematician Johannes Kepler made an early documented study of tessellations, and in the 19th century a mathematical study of the entire process was undertaken.1 Geospatial analysis today uses tessellation techniques not only to split a large area into several smaller ones, in the traditional manner, but also into smaller areas that feature similar shapes but also variable sizes. Thus, analysts can see and assess the incredibly dense data points of a city, along with the fewer data points existing within a much larger section of the U.S. – the Great Plains, for example.

But while tessellation can allow for more manageable chunks of geography to be analyzed, it’s only via the marriage of math and technology that it can be done efficiently and quickly. Many of today’s systems that can handle geospatial data – specifically, massively parallel processing (MPP) databases that are partitioned

across multiple servers or nodes – have tessel-lation features built in, to better analyze spatial data. In addition, these systems apply “spatially aware indexes” to filter the data.

Data visualization is enhanced by picking one such example from database services provider Teradata. The company uses spatially aware indexes and “joining” techniques to filter data input sets, and spatial objects – minimum bounding rectangles (MBRs) in particular – to refine the filtered result. Teradata then provides a set of functions into a two-dimensional grid. This is the tessellation.

A spatial index should be based on patterns and demographics. Consider our Teradata example, where two spatially aware methods are used: tessellate and geodetic buffering functions.

• Tessellate: This is primarily intended to be used for “join processing.” This process takes two or more pieces of information – say, all the company’s SKUs, plus particular customers’ product needs and desires – and joins them together to meet an actionable objective – an appropriate sales pitch, for example, or a customer satisfaction report. Even with the technological advances of recent years, the join operation remains a time-consuming operation in relational query processing.

Tessellate_Index is used for index mainte-nance, while Tessellate_Search is used to query and index the information that was built using Tessellate_Index. Say you are trying to find customer locations based on latitude and longitude, derived from their mobile devices, and you have about 2 million customers spread across some 175 countries. Tessellate can deliver a performance boost of 99.9% compared with traditional queries that do not use this technique.

• Geodetic buffering functions: Say you are looking to identify customers close to a particular store, something that is particu-larly useful when applying targeted marketing via personalized coupons, sales informa-tion, special deals, etc. Geodetic buffering

cognizant 20-20 insights 3

The enormous volume of data can easily overwhelm analysis; in fact, the huge growth of data volume has become the number one challenge for geospatial analysis.

Page 4: Data Science: How the 'Marriage' of Technology and Math Can Produce Actionable Results

cognizant 20-20 insights 4

can, for example, find the boundary for each store. Tessellate_Index and Tessellate_Search functions can then be used to identify and restrict the customers within the boundaries, and run an analysis about only those customers.

Consider geospatial analysis in a retailing situation. Figure 2 represents stores as yellow stars and customers as small circles. A 1,000-meter circle is drawn around each store, depicting the customers that are within a comfortable walking or driving range of that retail location.

Distance calculations from a particular store can take into consideration customers within the

same cell for that store, as well as customers in other cells, to determine how to convey the avail-ability of special deals, interesting menu items, products that are on sale, coupons, etc. Now consider how this can be expanded further into larger geographical areas – say, to the entire con-tinental U.S., with many thousands of stores and tens of millions of customers. A query for a store in San Jose will take into consideration customers only in that area, and not the customers in Omaha or Austin. This has the potential to dramatically improve retail offers and opportunities that are particularly tailored to certain geographic locales.

Figure 2

Geospatial Analysis of Retail Customers: An Illustrative View

Cell 5

1000 m

Store Customer

1000 m

1000 m

Cell 1 Cell 2

Cell 3 Cell 4

Cell 6

Cell 7 Cell 8

1000 m

Page 5: Data Science: How the 'Marriage' of Technology and Math Can Produce Actionable Results

cognizant 20-20 insights 5

Remember again that queries using spatial indexes are much faster than queries that do not use spatially aware indexes. This is an essential attribute that enables rapid response and quick alterations of marketing and product-supply decisions.

Geospatial Analysis: A Cross-Industry PerspectiveIn the marriage of technology and math, there’s no need to restrict geospatial analysis to retailing. Consider the financial services company that wants to make sure its customers’ credit card accounts are secure. If analysis shows that a particular credit card was used in London in the morning, and again in Beijing in the afternoon, red flags may certainly be raised as to the pos-sibility of a lost or stolen card. That is, unless the system knows that the customer is a frequent traveler who indeed has visited and worked in those locations in the past. The same can be done with, say, a mobile wallet that is tracked and pinpointed via smartphone transactions.

Suppose an insurance company is able to map an approaching hurricane and its potential landfall, which threatens the company’s customers in a particular city or state. Here, appropriate decisions can be made about potential insurance payouts based on storm location, direction, at-risk customers, etc.

What follows are real-world examples of geospatial analysis that we have built and delivered to clients in various industries:

• Technology: For a global company developing geospatial information systems (GIS) products, Web-based GIS was developed for sales territory management. This map-based solution helped the teams planning sales efforts to identify prospective customers as well as the competi-tion also vying for those customers. As a result, a new campaign was enhanced, and sales were improved.

• Telecom: A large European telecom provider needed a solution to better plan its cell-tower networks. We developed and deployed a solution using products from Environmental Systems Research Institute (Esri), deploying its GIS software and geodatabase management applications to make sense of future cell tower locations.

• Utilities: A large oil and gas content-manage-ment company needed to estimate the volume and value of asset deployment in various locations. We created business intelligence models that better informed product develop-ment.

• Nongovernmental organizations: A global child welfare NGO needed to manage its funds better, to improve its programs, projects and entire communities, and to monitor change in the process. Here, we developed and deployed both Web-based and mobile GIS technology to produce the desired outcomes.

• Retail: A large European retailer faced the challenge of optimal new-store setup; in particular, it sought a fact-based approach to better understand where its best customers might reside. We developed a geospatial analytics process using the Huff Model,2 to determine sales potential per location based on such factors as customer disposable income, distance from the store, overall population and other variables.

Looking Forward: The Road AheadThese examples are powerful illustrations of data science’s potential impact across industries. The marriage between technology and mathematics is delivering actionable insights, with analyses deployed to address specific, practical use cases that companies face every day.

It’s not so much about math and technology per se, but about the application of mathematical features within a technological world to solve business challenges. With an increasing under-standing of the importance of data science, and the importance of data scientists, business is headed in the right direction.

For starters, an organization should understand and embrace the following to benefit from its application of data science adaptability:

• Intent: Organization heads and C-level executives must define clearly the data science component of organizational culture.

Distance calculations from a particular store can take into consideration

customers within the same cell for that store, as well as customers

in other cells, to determine how to convey the availability of special deals, interesting menu items, products that

are on sale, coupons, etc.

Page 6: Data Science: How the 'Marriage' of Technology and Math Can Produce Actionable Results

About Cognizant

Cognizant (NASDAQ: CTSH) is a leading provider of information technology, consulting, and business process services, dedicated to helping the world’s leading companies build stronger businesses. Head-quartered in Teaneck, New Jersey (U.S.), Cognizant combines a passion for client satisfaction, technol-ogy innovation, deep industry and business process expertise, and a global, collaborative workforce that embodies the future of work. With over 100 development and delivery centers worldwide and approxi-mately 233,000 employees as of March 31, 2016, Cognizant is a member of the NASDAQ-100, the S&P 500, the Forbes Global 2000, and the Fortune 500 and is ranked among the top performing and fastest growing companies in the world. Visit us online at www.cognizant.com or follow us on Twitter: Cognizant.

World Headquarters500 Frank W. Burr Blvd.Teaneck, NJ 07666 USAPhone: +1 201 801 0233Fax: +1 201 801 0243Toll Free: +1 888 937 3277Email: [email protected]

European Headquarters1 Kingdom StreetPaddington CentralLondon W2 6BDPhone: +44 (0) 20 7297 7600Fax: +44 (0) 20 7121 0102Email: [email protected]

India Operations Headquarters#5/535, Old Mahabalipuram RoadOkkiyam Pettai, ThoraipakkamChennai, 600 096 IndiaPhone: +91 (0) 44 4209 6000Fax: +91 (0) 44 4209 6060Email: [email protected]

© Copyright 2016, Cognizant. All rights reserved. No part of this document may be reproduced, stored in a retrieval system, transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the express written permission from Cognizant. The information contained herein is subject to change without notice. All other trademarks mentioned herein are the property of their respective owners.

Codex 1889

A roadmap and strategy, applied top-down in an organization, helps gather the needed management support to bring the change.

• Adaptability: When applying new technology, culture is one of the most important things to consider. This does not mean to throw away the existing technology stack but organizations must understand the importance of blending old with new. “Weave” architecture helps orga-nizations grow successfully with the coexis-tence of traditional and big data technologies. Each by itself is not sufficient to address all the needs of an organization. A perfect blend of both is what defines the road to success

• Embrace new organizational structures: Not to understate the importance of the CTO, but the chief data officer (CDO) has an important role to play here. With a mix of technologies and culture, an organization needs to protect internal and external information. The CDO does exactly that, and also helps promote the culture of fact-based decisions – facts backed by data.

• Agility: Gone are the days when a one-size-fits-all solution that never changes for decades was a success. Organizations need to be flexible to adapt to new changes, processes, technologies and data usage policies.

Footnotes1 Fyodorov’s work marked the unofficial beginning of the mathematical study of tessellations. Other

prominent contributors include Shubnikov and Belov (1964), and Heinrich Heesch and Otto Kienzle (1963), https://en.wikipedia.org/wiki/Tessellation.

2 The Huff Model is a spatial interaction model that calculates gravity-based probabilities of consumers at each origin location patronizing each store in the store dataset. From these probabilities, sales potential can be calculated for each origin location based on disposable income, population or other variables. The probability values at each origin location can optionally be used to generate probability surfaces and market areas for each store in the study area, http://arcscripts.esri.com/details.asp?dbid=15999.

About the AuthorUdit Garg is a Senior Architect within Cognizant’s Analytics & Information Management business unit. As an expert on database technologies and a database architect, Udit has produced customized solutions based on corporate needs and roadmaps. He is a Teradata Certified Master and has over 10 years of experience in database technologies and architecture focused on roadmap definition, as well as solution and information blueprints. Udit has defined a personalization blueprint for digitalization exercises, created a system architecture for an enterprise warehouse and led a team that developed a performance optimization/capacity management solution in a recent engagement. He can be reached at [email protected].