24
Embedded Analytics in IBM DB2 Universal Database for Information on Demand Information Integration Software Solutions White paper August 2003

Embedded Analytics in IBM DB2 Universal Database for ...download.101com.com/pub/TDWI/Files/Embedded...16 Keeping data where it belongs —in the database 17 Opt for standard or custom

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Embedded Analytics in IBM DB2 Universal Database for ...download.101com.com/pub/TDWI/Files/Embedded...16 Keeping data where it belongs —in the database 17 Opt for standard or custom

Embedded Analytics in IBM DB2 Universal Database for Information on Demand

Information Integration Software SolutionsWhite paper

August 2003

Page 2: Embedded Analytics in IBM DB2 Universal Database for ...download.101com.com/pub/TDWI/Files/Embedded...16 Keeping data where it belongs —in the database 17 Opt for standard or custom

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 2

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 3

Finding the needle of competitive advantage in a haystack of data

One of the major business insights of our time is that information can be more valuable than capital assets in helping your firm stay competitive and profitable. Today’s technology can deliver more bits of information about your company or competitors than existed in all the world’s books 500 years ago. Yet paradoxically, the more information is at your fingertips, the harder it may be to derive useful answers from it, at least in the time frame that our fast-paced environment demands.

For example, you may wonder which of your shopper populations is the most profitable—and what promotional offer would be the most effective in getting that group to buy more. Or, which of your stores is running short of lawn chairs and barbecues soonest—and can any of your suppliers ship fresh inventory before summer ends? You suspect that the answers are hidden somewhere in the thousands, or even millions, of rows of transactional and purchasing data your company generates every month.

But the conventional wisdom is that only data-mining specialists have the skills and the tools to dig the answers out, by extracting data from the database serving your enterprise applications, loading it into a data warehouse and spending days or weeks combing through it. When this process is repeated, the data sets extracted at different times can develop incompatibilities that deliver conflicting answers, leading to corporate chaos. Moreover, the specialists’ skills are costly. When the data mining process becomes too unwieldy or expensive, companies often decide to skip it, leaving important insights undiscovered.

Data mining doesn’t have to be an expensive, time-consuming process that only specialists can carry out. IBM has taken the high-powered data mining functions that used to require specialized, stand-alone analytic tools, and embedded those functions directly in the data warehouse, putting the power of real-time analysis at users’ fingertips in a cost-effective way. With DB2® information management software from IBM, the power of data mining can be

Contents

2 Finding the needle of

competitive advantage in a

haystack of data

3 Why efficient data mining

matters

4 IBM embeds analytic power

into the enterprise data

warehouse

5 Benefits for every corporate

department and level

6 From data to information

and knowledge

8 Data mining delivers value

every step of the way

10 Model, score and visualize:

Three steps to basic

data mining

14 Data mining should bring its

insights back home

15 Closing the business

intelligence loop with IBM’s

data mining tools

16 Keeping data where it belongs

— in the database

17 Opt for standard or custom

analysis

18 SQL and PMML promote easy

integration

19 Partner cooperation pays off

for data mining users

Page 3: Embedded Analytics in IBM DB2 Universal Database for ...download.101com.com/pub/TDWI/Files/Embedded...16 Keeping data where it belongs —in the database 17 Opt for standard or custom

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 2

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 3

infused into enterprise applications and used easily and intuitively, without any statistical skills. And because the data stays in the applications that generated it, everyone is working off the same set of answers and can therefore coordinate their efforts more smoothly.

This paper gives an overview of data mining, including what it is, how it works and the difference between “data,” “information,” “knowledge” and “business intelligence.”

It then discusses how IBM’s data mining approach can offer your business keener analytical insights that can help strengthen competitive advantage and, ultimately, your profits.

Why efficient data mining matters

Data analysis has been around for more than 60 years. The cliché image involved white-coated scientists feeding punch cards into a wall-sized supercomputer covered with buttons and twinkling lights, then waiting hours or days for an answer. Now, in the “Information Economy,” everybody wants that kind of analytical insight, but cheaply and quickly. They know that successfully selling goods or services depends on personalization and timing— that is, figuring out the right messages to deliver to the right people at the most opportune time, and coming up with the right answer while the customer is still within range.

Several trends have combined to increase the demand for data mining. For one thing, businesses are recognizing the success that stems from having a single, company-wide mission, such as having diners wait no longer than three minutes for a fresh hamburger, or achieving 100 percent customer satisfaction in processing home loans. But this requires all departments of the company—

including sales, manufacturing, procurement, marketing and bookkeeping— to contribute towards this goal. Therefore, corporate leaders need data from all these departments—and not just in raw form, but collected and reconciled into a single view of the truth that everyone can work from.

20 Data mining can deliver value

to any industry

22 Data mining: Key to helping

your business turn on a dime

23 Server-based data mining

offers functionality, safety

and speed

Contents

Page 4: Embedded Analytics in IBM DB2 Universal Database for ...download.101com.com/pub/TDWI/Files/Embedded...16 Keeping data where it belongs —in the database 17 Opt for standard or custom

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 4

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 5

Businesses are also finding that they must make decisions fast if they want to stay competitive in fulfilling their customers’ demands—especially since e-commerce has created rising expectations for instant demand fulfillment online. Information on demand is a key prerequisite, now that time is as important a strategic weapon as capital, productivity, quality or even innovation. A business that operates in an “on demand” basis is integrated end-to-end; it can sense and respond to changes (anything from an e-mail virus to a tornado to new fashion fads) in real time because it’s resilient, responsive to varying demands and focused on its core mission. But it can’t deliver this kind of fast response without accurate answers delivered quickly.

Technological progress has also supported the rise of data mining. On the hardware and networking fronts, it continually delivers faster, higher-volume delivery and more convenient, less expensive storage of large quantities of data. On the software front, many vendors offer suites of connected applications to automate manufacturing, supply-chain planning, financials, HR, sales and customer service. Yet this technology can also be a handicap when it delivers large quantities of uncorrelated facts that deliver limited insight of their own accord. And many existing data mining products require data to be moved away from the enterprise applications that generated them and into a separate data warehouse tended by experts.

IBM embeds analytic power into the enterprise data warehouse

What companies need is a data mining solution with broad access, one that leverages the power of the enterprise’s relational database instead of creating stand-alone data “silos,” and can deliver information on demand to line-of-business (LOB) employees who aren’t necessarily analytical specialists.

IBM, a leader in information management for decades, has recently infused data mining functionality directly into its flagship DB2 family of information management software to deliver an unsurpassed combination of functionality,

Page 5: Embedded Analytics in IBM DB2 Universal Database for ...download.101com.com/pub/TDWI/Files/Embedded...16 Keeping data where it belongs —in the database 17 Opt for standard or custom

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 4

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 5

speed and ease of use. In particular, IBM DB2 Universal Database™ Data Warehouse Edition has been created to offer a complete data mart infrastructure product that includes integrated online analytical processing (OLAP) and other business intelligence functions. It includes at no extra charge key data mining functions including modeling, scoring and visualization. (These capabilities also continue to be sold separately, as does the stand-alone tool, DB2 Intelligent Miner™ for Data.) This paper’s discussion of the latest developments in data mining from IBM is based on the functions that are part of DB2 Data Warehouse Edition.

Benefits for every corporate department and level

Just about every department can benefit from data mining. The most obvious candidates are the marketing and sales departments, for outbound customer contacts, and customer service, for inbound contacts. But a purchasing manager can also use data mining technology embedded in the database of an ERP application to find the most cost-effective suppliers, ranking them by price, quality, delivery time and other factors. The head of factory-floor production can sift through quality reports from data mining-enabled applications to figure out which stages of the assembly line need to be retooled. Human resources can quickly spot who’s using too much sick leave or not enough vacation time.

There’s a suitable flavor of data mining for every level of the organizational pyramid, too. An LOB professional or a department head in a small organization can get quick answers to focused questions, such as predicting which customers are most likely to have problems paying their bills, so those accounts can be watched more closely. A vice president can get a much more inclusive view of the company’s overall account payment patterns this month. And the CFO can boil down gigabytes of purchasing and payment data into a high-level map of profitability in the last 12 months and predictions of profitability for the year to come.

Page 6: Embedded Analytics in IBM DB2 Universal Database for ...download.101com.com/pub/TDWI/Files/Embedded...16 Keeping data where it belongs —in the database 17 Opt for standard or custom

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 6

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 7

Although we will discuss the technological underpinnings of data mining later in more depth, we should note here that data mining software tools come in two basic types, server-based (that is, functionality embedded in the database or other enterprise application) and client-based (that is, specialized “workbench” tools that are typically used by statisticians to perform complex queries). Until recently, server-based data mining tools were also accessible only to specialists, as they generally require custom programming to perform even the smallest queries. However, the new, simplified SQL (Structured Query Language) functions that IBM has built into DB2 Data Warehouse Edition allow simple data mining applications to be built right on top of the database, to provide easy answers to small, focused questions. This new flexibility is how IBM is leading the way in opening up data mining for many more line-of-business uses.

From data to information and knowledge

To start our tour of the data mining world, let’s start by defining our basic terms. In the classic definition, popularized by Claremont Graduate School professor Peter Drucker (often called “the father of management science”), data, the basic building block of human knowledge, consists of separate, uncorrelated raw facts. Information is data endowed with relevance and purpose. That is, relationships are made among the original facts, which gives them a meaningful context. Knowledge is created when human minds incorporate and act on information. Business Intelligence (BI) is a specialized subset of knowledge—useful insights derived from business data that are helpful to the company. Keep in mind that raw data by itself delivers no benefits, and indeed may be a distraction. And information does not deliver business value until people know and act on it—at which point it becomes BI.

Page 7: Embedded Analytics in IBM DB2 Universal Database for ...download.101com.com/pub/TDWI/Files/Embedded...16 Keeping data where it belongs —in the database 17 Opt for standard or custom

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 6

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 7

One important way to collect BI is through data mining— that is, searching through large quantities of data to identify patterns and establish relationships. The data mining analysis can be conducted according to either a predefined scheme (totaling and comparing sales figures month-to-month or year-to-year) or an open-ended search for patterns (for example, common factors in the medical histories of flu patients or most frequent combinations of items within the shopping carts of supermarket patrons). Some of these questions can be answered quickly with data mining functions embedded in the database, while others require specialized data mining workbench tools.

Some of data mining’s benefits to the enterprise include:• Increasing profitability, by pinpointing the most profitable customers or products and the best ways to target those segments (can be accomplished with the parallel scoring capabilities embedded in DB2).• Decreasing losses, for example, by finding more cost-effective suppliers (cheaper, faster or higher-quality), or detecting fraud, abuse or waste (embedded functions).• Improving customer satisfaction, by anticipating customers’ needs and filling them more quickly and effectively (workbench mining).• Strengthening competitive advantage, by identifying trends sooner, whether it’s the hottest new colors for cars (embedded server-based data mining) or the experimental compound most effective at treating cancer (workbench mining, due to the very large data sets used). Companies can then respond sooner and can help improve product quality.

Page 8: Embedded Analytics in IBM DB2 Universal Database for ...download.101com.com/pub/TDWI/Files/Embedded...16 Keeping data where it belongs —in the database 17 Opt for standard or custom

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 8

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 9

Note that the kinds of questions answered by data mining can all be phrased in everyday wording any LOB manager will understand without technical training, even though very sophisticated analytic algorithms may be required to describe them mathematically. That is the great benefit of data mining: harnessing the power of advanced data-crunching methods such as regression to the mean or brute-force matching to address high-level business concerns.

Data mining delivers value every step of the way

Now let’s take a closer look at ways that data mining can deliver business value within an organization.

First of all, the advanced analytic tools used in data mining can segment your customers and markets more quickly and in more detail. They can tell you what kind of customer, as defined by demographics and buying patterns, is likely to buy a particular product or behave in a given situation (for example, how long it will take them to pay off a credit card balance).

Once customers have been segmented, data mining can help you personalize their treatment. For example, Web site visitors reveal their interests according to the pages they visit, and can be served appropriate pop-up ads. Bank customers can be offered different credit card interest rates or mortgage loans according to their financial history. HMO patrons may benefit from reminder calls or e-mails to finish all their prescription medication. In an age of mass merchandising and consumerism, the personal touch is often what makes the difference in the buying decision. This can be delivered at the point of sale, by mail, over the phone or online.

Page 9: Embedded Analytics in IBM DB2 Universal Database for ...download.101com.com/pub/TDWI/Files/Embedded...16 Keeping data where it belongs —in the database 17 Opt for standard or custom

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 8

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 9

One particular type of personalization is loyalty marketing— that is, giving incentives to your best customers to stay with your company, such as “buy nine coffees, get the tenth free.” It used to be assumed that only small merchants could get to know their customers well enough, on a face-to-face basis, to make loyalty pay off. But data mining can sort through millions of transactions in order to draw pictures of each customer’s buying habits and come up with appealing incentives (such as coupons generated by using a supermarket club card). Moreover, it can use the results of the first run of incentives in a feedback loop to make future incentives more accurately targeted and therefore more profitable.

Another result of getting to know customers better is quicker recognition of when their consumption patterns change. This can often indicate abuse or theft. For example, if a customer who normally uses an ATM card to take out $40 every Friday and Monday suddenly takes out the $500 maximum several days running, that can create a red flag for potential theft. The company has the power to freeze the card and alert the customer. Likewise, an HMO patient who tries to have a narcotics prescription filled several times at different pharmacies can be spotted before he or she starts reselling the drugs on the street.

More broadly, data mining can also help companies find correlations that weren’t obvious to the naked eye. These insights can help companies create new products, improve their merchandising or cut costs. For example, data mining analysis of emergency room admissions may reveal that a significant number of patients are complaining of relapse after failing to take all the medication prescribed for an earlier problem. To reduce the high costs of ER use, the hospital can set up e-mail or phone reminders urging patients diagnosed with, say, pneumonia to finish taking all their pills even after they start feeling better.

Page 10: Embedded Analytics in IBM DB2 Universal Database for ...download.101com.com/pub/TDWI/Files/Embedded...16 Keeping data where it belongs —in the database 17 Opt for standard or custom

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 10

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 11

Where data mining can find patterns, it can also find exceptions. It can look for store branches that are especially profitable—or unprofitable—compared to others in similar locations, triggering a look into why that store is succeeding or failing. It can quickly spot when supplier deliveries or invoice payments are starting to run late, and predict how broadly that will affect the company’s entire supply chain.

Model, score and visualize: Three steps to basic data mining

The three main steps of data mining are modeling, scoring and visualization. The first and key step of data mining is to model the data, that is, sort a statistically valid sample of it according to any of a wide variety of algorithms in order to make underlying patterns or relationships easier to see. (A sample is used because the entire pool of data— say, a month’s worth of purchase data from a nationwide department store chain—would be unmanageably large, totaling terabytes.) The modeling component within DB2 Data Warehouse Edition supports a wide range of statistical functions and algorithms, including:

• Demographic and neural clustering • Tree and neural classification • RBF and neural prediction • Polynomial regression

However, the LOB user doesn’t need to be familiar with these advanced statistical terms. DB2 Data Warehouse Edition understands that statistical functions are the building blocks of more business-oriented operations, such as associations discovery (looking for connections between presumably unrelated sets of data), demographic clustering (breaking down customer behavior by age, gender, income, neighborhood, etc.), and tree classification (breaking results into groups according to various factors, and then into sub-groups, sub-sub-groups, and so forth, so that the results look like a family tree).

Page 11: Embedded Analytics in IBM DB2 Universal Database for ...download.101com.com/pub/TDWI/Files/Embedded...16 Keeping data where it belongs —in the database 17 Opt for standard or custom

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 10

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 11

The output of the modeling process is (not surprisingly) a “model”— that is, a collection of rules or relationships discovered by the mining process. A typical model is generally described by one or more complex mathematical algorithms. But data analysis with DB2 also sorts data into intuitively understandable patterns, including:

• Classifying by type• Segmenting groups of related records• Association, or making correlations between one event or parameter (for example, car theft) and another (the make of the car stolen)• Clustering, or looking for connections between data sets without a known prior connection (e.g. “Brie cheese is bought in connection with French bread 38% of the time”)• Forecasting, or using discovered patterns to make reasonable predictions about the future or similar sets of data.

Results of the modeling process within DB2 Data Warehouse Edition could include such easily understandable concepts as the most (or least) profitable product lines, the accounts with biggest balances due or the products most likely to be found together in a market basket. (For example, a grocery store may find that disposable aluminum baking pans are most often bought in the same shopping trip with cake mix, and therefore the pans would sell better on the cake mix aisle than shelved “logically” with other aluminum foil products.)

Page 12: Embedded Analytics in IBM DB2 Universal Database for ...download.101com.com/pub/TDWI/Files/Embedded...16 Keeping data where it belongs —in the database 17 Opt for standard or custom

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 12

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 13

Even better, DB2 Data Warehouse Edition can refresh its own models automatically as the underlying data changes. Suppose that you own a clothing factory that makes football jerseys and, predictably, the most popular team and number belong to this year’s star kicker. Your projected production levels are based on the previous week’s sales data, which generally hover around 50,000 shirts sold nationwide. But suddenly the star is traded, so his number and team colors change. If you follow sports news, you might realize this means you have to shift your product lines. But even if you don’t, DB2 Data Warehouse Edition will notice when sales of the old football jersey number suddenly drop and demand for the new one increases. It will automatically update its existing model based on the changing data feed, and alert you to consider changing your underlying assumptions.

Once the data has been sifted to create a model, visualization follows, to confirm that the model is making useful distinctions. Visualization, or drawing pictures of the data in two or three dimensions, can help the human viewer find correlations that might not have been visible in a list of words or figures. Graphing, invented by Rene Descartes in the 1630s, is one of the oldest visualization methods as well as the most common, but it is not always appropriate for the type of data being evaluated. In recent years, highly intuitive, user-friendly tools have been developed using shapes, movement or color to help distinguish important data points (for example, planes that have been waiting the longest for take-off, or pockets of high tissue density that could indicate cancer). The Java™ technology-based visualization capacity of DB2 information management software uses multiple paradigms including pie charts, “bulls-eyes” and bar charts to visualize the model derived from the data, including how new data samples measure up against the model.

Page 13: Embedded Analytics in IBM DB2 Universal Database for ...download.101com.com/pub/TDWI/Files/Embedded...16 Keeping data where it belongs —in the database 17 Opt for standard or custom

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 12

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 13

Visualization is useful not just to make a point but as a valuable feedback mechanism. For example, a theory that car buyers with annual incomes over $80,000 are more likely to buy car model X than model Y can be shown to be invalid because the wealthier car buyers are still clustering in the same corner of the demographic quadrant as the shoppers making less than $80,000. In that case, the marketing executives trying to figure out how to distinguish the buyers of model X from the buyers of Y would have to alter the model—perhaps theorize that X appeals more to people who value fuel efficiency over power—

and see if that query succeeds in segmenting the buyers. In the football jersey example above, simply looking at a list of declining sales from week to week might have alerted you that demand was changing. But with a more complex task such as market segmentation, merely scanning a list of car buyers broken down by income would not have demonstrated that something was wrong with your original model. With DB2 Data Warehouse Edition, the visualizer function is easy enough to be used by LOB analysts who have not had in-depth statistical training, even for sophisticated tuning and refreshing of analytical models.

Once the model has been tested and tuned for accuracy, new data can be scored against the model— that is, judged by the rules discovered in the modeling process. For example, once a bank has used modeling to determine which three types of loans are most profitable for homeowners with children living in XXXXX ZIP code, it can now use the data from a new customer’s application to see whether she fits that model and should therefore be offered one of those three loans.

Page 14: Embedded Analytics in IBM DB2 Universal Database for ...download.101com.com/pub/TDWI/Files/Embedded...16 Keeping data where it belongs —in the database 17 Opt for standard or custom

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 14

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 15

Data mining should bring its insights back home

The classic approach to preparing data for mining is to copy a small sample of the data set under consideration into a separate analytical environment The data must first go through the ETL or “extract, transform, load” process. That is, all the relevant data supplies (for example, inventory levels and point-of-sale transaction records for each month) are extracted from their various sources. Then they must be cleansed (checked for errors); transformed into a single, standardized format; and loaded into a specialized data mining workspace. This is easily the most time-consuming part of the data mining process.

Only then can the data mining specialist start the model-visualize-score data mining process and cut this data into cubes or slices on which to perform analytical operations such as association, clustering, forecasting and so forth. The results are formatted into a report that the analyst delivers to his or her internal customers.

The separate data mining environment approach works well in some cases, including very small source data sets or when speed and cost are not factors. However, this can be time-consuming, expensive and dependent on specialized expertise. This means that LOB staff, who generally have the best idea of what they want to know and how to interpret any ambiguities in the data, aren’t able to do their own analysis. Most importantly, this approach only sends data one way. It doesn’t automatically insert the business intelligence back into the source applications (such as ERP or CRM suites). Any feedback must be done manually, which takes a long time and leaves room for error to creep in.

Page 15: Embedded Analytics in IBM DB2 Universal Database for ...download.101com.com/pub/TDWI/Files/Embedded...16 Keeping data where it belongs —in the database 17 Opt for standard or custom

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 14

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 15

Closing the business intelligence loop with IBM’s data mining tools

By contrast, the data mining-enabled DB2 Data Warehouse Edition closes the loop and drives the results of data mining back into the operational applications. This way, you can make more informed decisions based on timely intelligence—

and improve your business processes at key internal or external touchpoints.

The main difference is that the IBM approach is server-based— that is, it performs its data mining operations on the data that rests in the DB2 enterprise database the company is already using. This means the analysis is embedded into specific applications, such as call centers or real-time fraud detection, and therefore the results are delivered much closer to the point of customer or supplier contact, in a more accurate and easy-to-use form.

The IBM approach doesn’t disturb the underlying data at all. Instead, the models that have been developed in the data warehouse with the modeling capacity of DB2 Data Warehouse Edition become objects in the database in their own right. The process of updating a model is simply to update the database using SQL commands—no need for complex special coding. Any new business rules that are applied to the database (for example, "all credit card customers with more than three late payment charges in a 12-month period will have their interest rates raised to 19 percent") will automatically be applied to the data mining models and the scoring process.

This means your mining models stay in step with your business processes, reflecting your company’s current needs and priorities at all times. Access control, versioning and backup of the data mining environment are also automatic, because these tasks are part of the overall database maintenance activity.

Page 16: Embedded Analytics in IBM DB2 Universal Database for ...download.101com.com/pub/TDWI/Files/Embedded...16 Keeping data where it belongs —in the database 17 Opt for standard or custom

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 16

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 17

By contrast, other vendors’ data mining solutions don’t offer this type of fast, automatic updating of models and their underlying business rules. And most don’t use SQL for their data mining commands, which means users have to learn complex proprietary commands.

Keeping data where it belongs— in the database

There are many business benefits to IBM’s embedded, server-based approach. First, the data stays within the original database, without costly, time-consuming transfer to and from an external data mining environment. Eliminating transfers also cuts down on data loss and data integrity errors. Also, this approach is more secure. Data models, which are valuable corporate assets, stay on well-guarded central servers, not on individual analysts’ computers, where they could be lost, damaged or shared with unauthorized users.

Moreover, keeping the database and the data mining process centralized means that every query is based on the same source of knowledge, rather than fragmenting into multiple and increasingly incompatible versions. This reduces errors, conflicts, gaps, finger-pointing and time spent compiling and cross-checking separate information sources. And it promotes the feedback of business insights from the data mining process back into the source database and the enterprise applications connected to it. This leverages the power of your enterprise-class database and improves its ROI.

However, it should also be noted that the IBM approach does not rigidly restrict mining operations to the enterprise database. Although modeling usually takes place in the DB2 data warehouse, scoring can take place either in the warehouse or in an enterprise application system— for example, evaluating bank customers “on the fly” when they dial a service center, to give the customer service representatives near-instant feedback on which products or services to offer.

Page 17: Embedded Analytics in IBM DB2 Universal Database for ...download.101com.com/pub/TDWI/Files/Embedded...16 Keeping data where it belongs —in the database 17 Opt for standard or custom

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 16

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 17

Opt for standard or custom analysis

IBM offers a choice of two approaches for extending the querying functionality within DB2 Data Warehouse Edition, depending on your needs. First, it supports the use of standard third-party analytical and modeling tools on top of the data warehouse, the way that most competing database vendors operate. However, this approach generally requires complex statistical skills; non-statisticians need not apply. Moreover, end users are restricted to the interfaces and query types built into the third-party tools. And these tools are hard to integrate into a production database, as opposed to a data warehouse set aside for data mining. (As an alternative, you can use as the analytical tool IBM DB2 Intelligent Miner for Data, which is pre-integrated to work smoothly with DB2 Data Warehouse Edition.)

Second, IBM offers a more flexible, sophisticated approach for generating specific types of analysis, using the standard SQL extensions called SQL-MM. This approach requires more up-front expertise from application programmers, but the net result is custom data mining analysis with easy-to-use front ends. Application programmers can create virtually any data mining system they need using this application programming interface (API).

IBM recently added a shortcut to its user-friendly interface design—Easy Mining Procedures, a set of one-line SQL calls that allow easy embedding of data mining functions into existing business applications. The Easy Mining Procedures can be used in any client tool that supports free-form SQL, including SAS applications, Microsoft Excel spreadsheets and Business Objects reports. LOB users with little time or patience for the learning curve will appreciate the ability to start data mining through desktop application with which they’re already familiar (once the customer report templates have been created by IT staff).

Page 18: Embedded Analytics in IBM DB2 Universal Database for ...download.101com.com/pub/TDWI/Files/Embedded...16 Keeping data where it belongs —in the database 17 Opt for standard or custom

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 18

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 19

Meanwhile, database administrators will appreciate the ease of administration; the SQL calls for Easy Mining Procedures become part of the underlying DB2 database once they are installed. There is no need to set up additional tools or client/server interfaces. The DBA can manage the configuration with standard tools.

SQL and PMML promote easy integration

Many other benefits of the DB2 Data Warehouse Edition are “under the hood,” that is, derived from its internal configuration. Because it is written in the common language of SQL, an application programmer can easily integrate their modeling and scoring functions with other business programs such as ERP, CRM or online transaction processing (OLTP) applications. For example, a database administrator can create an SQL “VIEW” on top of a DB2 Data Warehouse Edition scoring function, thereby making real-time scoring results available to any database application accessing that view. The business applications don’t need to actually code the data mining functions at all.

And because the models are stored in PMML, the vendor-neutral industry standard for denoting data mining values and operations through XML-based tags, they can be transferred to any PMML-compliant application, whether DB2 or a wide range of PMML-compliant add-on tools from partner vendors. Open standards give IBM DB2 users more flexibility in how they use and customize their data mining applications.

The open nature of PMML is also what makes the scoring functions of DB2 Data Warehouse Edition usable outside of the data warehouse. The scoring functions are built as user-defined extensions to DB2; because they are built according to the PMML standard, you can score records with data mining models from other DB2 Data Warehouse Edition users, or import models from SAS Enterprise Miner and third-party applications. This removes the need for hand-crafted proprietary scripts for integration. All the new data mining

Page 19: Embedded Analytics in IBM DB2 Universal Database for ...download.101com.com/pub/TDWI/Files/Embedded...16 Keeping data where it belongs —in the database 17 Opt for standard or custom

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 18

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 19

components within DB2 Data Warehouse Edition are compatible with PMML V2.1, the latest version, and will continue to be PMML-compliant as new versions of the language are released.

As we have seen, the thoughtful design of the new data mining components within DB2 Data Warehouse Edition results in easier installation and administration, simpler maintenance of mining meta data and results, easier integration and a choice of simple or customized query types. They also deliver better performance, since the parallel architecture of DB2 improves the speed and performance of data preparation and data mining tasks such as scoring large sets of data.

Partner cooperation pays off for data mining users

IBM recognizes that data mining also includes a wide range of specialized functions not included in the DB2 Universal Database or Intelligent Miner product families, such as productivity “scorecards,” heavy-duty analytical tools for statisticians, or industry-specific tools (e.g. for banking). However, rather than spread itself too thin trying to develop data mining products for every conceivable need, IBM works with partners already in those fields. In this partnership strategy, IBM delivers core data mining functionality stemming from the database while IBM Business Partners such as SAS, a leading creator of analytical applications, provide specialized add-ons. An intelligent division of labor through partnerships can deliver greater value for the end user because the partner vendors can focus their expertise on developing cutting-edge business intelligence tools, without fear of being overtaken by the database vendor.

Compared to competing relational databases from Oracle and Microsoft, the embedded data mining functions of DB2 have superior depth of function and scalability. The DB2 Intelligent Miner tools also offer more sophisticated algorithms, easier access through SQL and SQL-MM extensions, and the ability to score records in parallel. These add up to faster, more powerful—but also more user-friendly—data analysis compared to other vendors’ offerings.

Page 20: Embedded Analytics in IBM DB2 Universal Database for ...download.101com.com/pub/TDWI/Files/Embedded...16 Keeping data where it belongs —in the database 17 Opt for standard or custom

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 20

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 21

Data mining can deliver value to any industry

Data mining can help just about any industry derive more business intelligence from its enterprise data. Hundreds of satisfied IBM customers can illustrate how they’re making data mining pay off. The following list of typical data mining applications only scratches the surface.

• Retailing

The retail industry, which spans everything from food to clothing, appliances, hobby supplies and jewelry, finds data mining highly valuable because every transaction is based on consumer preferences, which are highly susceptible to outside influences. Store location, merchandise layout, packaging designs, coupons and other incentives, even what a clerk suggests at the checkout counter—all can make the difference between profit and loss.

Data Mining can help answer all these questions. It can analyze customers’ ZIP codes to figure out the best location for a new branch. It can figure out whether beach towels would sell better in the linen department next to bath towels, or up front as a summer impulse buy. It can draw a picture of the demographics and buying habits of a store’s customers, which can help the company focus on the most high-profit customer segments.

The speed of IBM’s real-time data mining capabilities becomes particularly handy at the point of sale. For example, someone buying a turkey can be offered an instant “clipless coupon” for canned cranberry sauce, sweet potatoes or other complementary products.

Page 21: Embedded Analytics in IBM DB2 Universal Database for ...download.101com.com/pub/TDWI/Files/Embedded...16 Keeping data where it belongs —in the database 17 Opt for standard or custom

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 20

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 21

• Banking and consumer finance

Real-time analysis is also helpful in the fields of banking and consumer finance, where many customers now prefer to do business over the phone or Web, rather than sitting in a loan officer’s waiting room. In just seconds, a customer service representative using an IBM DB2 data mining and customized application software can learn about the most profitable loan interest rate to offer the particular customers on the line or other financial products that may interest them. It can also speed up the detection of fraud or bad loan approvals.

• Investment and portfolio management

This field was one of the earliest to adopt data mining and analysis, in order to maintain the most profitable mix of stocks, bonds, commodities and other instruments in the face of market changes. Now data mining plays a bigger role than ever, as the industry copes with more dramatic market changes, more complex investment models (such as dollar-cost averaging) and a greater need to personalize financial portfolios for individual investors’ needs. IBM has been a leader in providing data mining solutions that can handle extremely large data sets and deliver answers fast, in a user-friendly way.

• Pharmaceuticals

Drug development depends on thousands of experiments to reveal whether various compounds are effective and free of side effects. Data mining can dramatically speed up the statistical analysis of these results, helping companies reduce time and resources spent on the less likely candidates. It can also provide answers on the ROI of taking a promising formula through the long, expensive drug approval and marketing process.

• Scientific research

Other types of scientific discovery also depend heavily on statistical analysis. IBM’s data mining algorithms can sift through massive amounts of data to uncover previously unrealized associations, patterns and trends. Predictive algorithms can score data by factors such as risk percentages or behavioral propensity.

Page 22: Embedded Analytics in IBM DB2 Universal Database for ...download.101com.com/pub/TDWI/Files/Embedded...16 Keeping data where it belongs —in the database 17 Opt for standard or custom

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 22

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 23

• Healthcare

As baby boomers age, demanding more medical care, and as healthcare increasingly becomes a private rather than public responsibility, hospitals and health insurance plans are under increasing pressure to deliver both profits and better patient outcomes. Data mining can help on both fronts. It can more effectively track and spot trends in patient needs, such as the medication reminders mentioned earlier, or search for common factors among victims of a disease outbreak. It can also seek out the most cost-effective treatments, or cut costs by detecting fraud and abuse.

• Law enforcement/investigation

Speaking of detection, the nation’s law enforcement agencies are starting to realize the value of having a “single view of the truth,” such as with potential terrorist watch lists or patterns of crime scene details that could indicate the path of a serial killer. Data mining can help build a suspect profile or identify the store locations that are most vulnerable to robbery. It can search telephone call records to see if two accused conspirators were communicating in the lead-up to a terrorist attack or help spot behavior patterns that could predict workplace shootings.

Data mining: Key to helping your business turn on a dime

As we’ve seen, data mining is a way to distill valuable insights from masses of uncorrelated facts. For industry after industry, the new data mining capacities within IBM DB2 Data Warehouse Edition provide information on demand that can help you increase revenues; focus on your most profitable market segments; decrease losses by targeting inefficiency, fraud and loss; spot and react to trends more quickly and accurately; and increase customer satisfaction.

Moreover, information on demand is a prerequisite for the world of e-business on demand, where speedy decision-making and action are vital. An on demand business is integrated end-to-end; it can sense and respond to changes in real time because it’s resilient, responsive to varying demands and focused on its core mission. But it can’t deliver this kind of fast response without knowledge and insights shared with the entire organization—

knowledge that IBM’s data mining tools are designed to provide.

Page 23: Embedded Analytics in IBM DB2 Universal Database for ...download.101com.com/pub/TDWI/Files/Embedded...16 Keeping data where it belongs —in the database 17 Opt for standard or custom

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 22

Embedded Analytics in IBM DB2 Universal Database for Information on DemandPage 23

Server-based data mining offers functionality, safety and speed

The server-based approach of data mining with DB2 Data Warehouse Edition eliminates the expensive and time-consuming process of taking the data out of your enterprise database and sending it to a separate data mining environment for analysis. Instead, it keeps the data centralized, so that all data mining queries are operating off a single view of your enterprise operations. This reduces the chance of discrepancies, confusion and wasted time reconciling conflicting information views. Plus, there’s less risk of loss, damage or theft of the data and mining models that are vital corporate assets.

And they have another advantage: they’re part of the larger family of DB2 information management solutions. These include:

• DB2 Universal Database• IBM DB2 OLAP Server, an integrated set of powerful OLAP and reporting tools• IBM DB2 Cube Views, a set of OLAP functions designed to work within DB2 (a white paper on this product, “IBM DB2 Cube Views enables information on demand,” has just been released)• IBM DB2 Warehouse Manager• The IBM DB2 Information Integrator family of tools, including DB2 DataJoiner and DB2 Relational Connect• The IBM DB2 Content Manager family of tools• A wide range of industry-specific BI solutions for banking, government, healthcare, retail, telecommunications and more.

IBM also provides a wide range of storage management software products within its Tivoli® software portfolio, as well as consulting and IT services to deliver the promise of e-business on demand.

With its broad base of information management and related technologies, IBM has distinguished itself as an enabler of real-time analytics. By leveraging IBM solutions, businesses can gain an unsurpassed framework for collecting, storing, managing and getting the maximum benefit from information, the lifeblood of any enterprise.

Page 24: Embedded Analytics in IBM DB2 Universal Database for ...download.101com.com/pub/TDWI/Files/Embedded...16 Keeping data where it belongs —in the database 17 Opt for standard or custom

GC18-7766-00

© Copyright IBM Corporation 2003

IBM CorporationSilicon Valley Laboratory555 Bailey AvenueSan Jose, CA 95141U.S.A.

Printed in the United States of America08-03All Rights Reserved

DB2, DB2 Extenders, DB2 Universal Database, IBM, the IBM logo, Intelligent Miner and Tivoli are trademarks of International Business Machines Corporation in the United States, other countries or both.

Java and all Java-based trademarks are trade-marks of Sun Microsystems, Inc. in the United States, other countries or both.

Other company, product or service names may be trademarks or service marks of others.

The information in this white paper is provided by IBM on an as-is basis without any warranty, guar-antee or assurance of any kind. IBM also does not provide any warranty, guarantee or assurance that the information in this white paper is free from any errors or omissions.

IBM undertakes no responsibility to update any information contained in this white paper. This publication contains Internet addresses of other companies. IBM is not responsible for the content on these Web sites.

Printed in the United States on recycled paper containing 10% recovered post-consumer fiber.