PROJECT ISM

Embed Size (px)

Citation preview

  • 8/7/2019 PROJECT ISM

    1/17

    DATA MINING: Applications and Trends

    Data mining has attracted a great deal of attention in the information industry

    and in society as a whole in recent years, due to the availability of huge amounts

    of data and the imminent need for turning such data into useful information and

    knowledge. Today as more data are gathered, with the amount of data doubling

    every three years, Data Mining is becoming an increasingly important tool to

    transform these data into information. It is commonly used in a wide range of

    profiling practices, such as marketing, surveillance, fraud detection and

    scientific discovery.

    INTRODUCTION

    Fig.1 Data Mining: Discovering hidden value in your data warehouse

    Data Mining is the exploration and analysis of large sets, in order to

    discover meaningful patterns and rules. The key idea is to find effective ways

    to combine computers power to process data with the human eyes ability .to

    detect patterns. The techniques of data mining are designed for work best with

    large data sets.

    Data mining is the process of extracting patterns from data It is the process

    of extraction of interesting (nontrivial, implicit, previously unknown and

    potentially useful) patterns or knowledge from huge amount of data. It is the set

    of activities used to find new, hidden or unexpected patterns in data or unusual

    patterns in data. Using information contained within data warehouse, data

    mining can often provide answers to questions about an organization that a

    decision maker has previously not thought to ask.

    Which products should be promoted to a particular customer?

    DATA MINING: Applications and Trends 1

    http://en.wikipedia.org/wiki/Marketinghttp://en.wikipedia.org/wiki/Surveillancehttp://en.wikipedia.org/wiki/Surveillancehttp://en.wikipedia.org/wiki/Fraudhttp://en.wikipedia.org/wiki/Surveillancehttp://en.wikipedia.org/wiki/Fraudhttp://en.wikipedia.org/wiki/Marketing
  • 8/7/2019 PROJECT ISM

    2/17

    What is the probability that a certain customer will respond to a planned

    promotion?

    Which securities will be most profitable to buy or sell during the next trading

    session?

    What is the likelihood that a certain customer will default or pay back a

    schedule?

    What is the appropriate medical diagnosis for this patient?

    These types of questions can be answered surprisingly easily if the information

    hidden among the data in your databases can be located and utilized.

    The importance of collecting data that reflect your business or

    scientific activities to achieve competitive advantage is widely

    recognized now. Powerful systems for collecting data and

    managing it in large databases usually take place in all large

    and mid-range companies. However, the bottleneck of turningthis data into your success is the difficulty of extracting

    knowledge about the system you study from the collected data.

    Human analysts with no special tools can no longer make sense

    of enormous volumes of data that require processing in order to

    make informed business decisions.

    Data mining automates the process of finding

    relationships and patterns in raw data and delivers

    results that can be either utilized in an automateddecision support system or assessed by a human

    analyst.

    DATA MINING: Applications and Trends 2

  • 8/7/2019 PROJECT ISM

    3/17

    1. HISTORIC DEVELOPMENT(EVOLUTION)

    Data mining techniques are the result of a long process of research and product

    development. This evolution began when business data was first stored on

    computers, continued with improvements in data access, and more recently,

    generated technologies that allow users to navigate through their data in real

    time. Data mining takes this evolutionary process beyond retrospective data

    access and navigation to prospective and proactive information delivery. From

    the users point of view, the following four steps were revolutionary because

    they allowed new business questions to be answered accurately and quickly.

    DATA MINING: Applications and Trends 3

    Data Collection (1960s)

    Data Access (1980s)

    Data Warehousing & Decision

    Support (1990s)

    Data Mining (Emerging Today)

  • 8/7/2019 PROJECT ISM

    4/17

    Fig 2: Evolutionary Stages of Data Mining

    Evolutionary Stages of Data Mining:

    Data Collection (1960s): At this stage:-

    Business question: What was my total revenue in the last five years?".

    Enabling technologies: Computers, tapes, disks.

    Product Providers: IBM, CDC.

    Characteristics: Retrospective, static data delivery.

    Data Access (1980s): At this stage:-

    Business question: "What were unit sales in New England lastMarch?".

    Enabling technologies: Relational databases (RDBMS),

    Structured Query Language (SQL), ODBC.

    Product Providers: Oracle, Sybase, Informix, IBM, Microsoft.

    Characteristics: Retrospective, dynamic data delivery at record

    level.

    Data Warehousing & Decision Support (1990s): At this stage:-

    Business question: "What were unit sales in New England last March?Drill down to Boston.

    Enabling technologies: On-line analytic processing (OLAP),

    multidimensional databases, and data warehouses.

    Product Providers: Pilot, Comshare, Arbor, Cognos , Micro strategy.

    Characteristics: Retrospective, dynamic data delivery at multiple levels.

    Data Mining (Emerging Today): At this stage:-

    Business question: "Whats likely to happen to Boston unit sales next

    month? Why?". Enabling technologies: Advanced algorithms, multiprocessor computers,

    massive databases.

    Product Providers: Pilot, Lockheed, IBM, SGI, numerous startups

    (nascent industry).

    Characteristics: Prospective, proactive information delivery.

    The core components of data mining technology have been under

    development for decades, in research areas such as statistics, artificial

    DATA MINING: Applications and Trends 4

  • 8/7/2019 PROJECT ISM

    5/17

    intelligence, and machine learning. Today, the maturity of these techniques,

    coupled with high-performance relational database engines and broad data

    integration efforts, make these technologies practical for current data

    warehouse environments.

    1.1THE PRESENT AND THE FUTURE

    The field of data mining has been growing in leaps and bounds, and has shown

    great potential for the future. What is the future of data mining? Certainly, the

    field has made great strides in past years, and many industry analysts and

    experts in the area feel that the future will be bright. There is definite growth

    in the area of data mining. Many industry analysts and research firms have

    projected a bright future for the entire data mining area, and its related area of

    CRM (customer relationship management). The growth in the CRM Analytic

    application market had approached 54.1% per year through 2003. In addition,data mining projects had grown by more than 300% by the year 2002. By

    2003, over 90% of consumer-based industries with e-commerce orientation

    had utilized some kind of data mining model. As mentioned previously, the

    field of data mining is very broad, and there are many methods and

    technologies which have become dominant in the field.

    1.2THE SCOPE OF DATA MINING

    Data mining derives its name from the similarities between searching for

    valuable business information in a large database and mining a mountain for

    a vein of valuable ore. Both processes require either sifting through an

    immense amount of material, or intelligently probing it to find exactly where

    the value resides. Given databases of sufficient size and quality, data mining

    technology can generate new business opportunities by providing these

    capabilities:

    Automated prediction of trends and behaviors: Data mining

    automates the process of finding predictive information in large

    databases. Questions that traditionally required extensive hands-on

    analysis can now be answered directly from the data quickly. A

    typical example of a predictive problem is targeted marketing. Data

    mining uses data on past promotional mailings to identify the targets

    most likely to maximize return on investment in future mailings. Other

    predictive problems include forecasting bankruptcy and other forms of

    default, and identifying segments of a population likely to respond

    similarly to given events.

    DATA MINING: Applications and Trends 5

  • 8/7/2019 PROJECT ISM

    6/17

    Automated discovery of previously unknown patterns. Data mining

    tools sweep through databases and identify previously hidden patterns in

    one step. An example of pattern discovery is the analysis of retail sales

    data to identify seemingly unrelated products that are often purchased

    together. Other pattern discovery problems include detecting fraudulent

    credit card transactions and identifying anomalous data that could

    represent data entry keying errors.

    Data mining techniques can yield the benefits of automation on existing

    software and hardware platforms, and can be implemented on new systems

    as existing platforms are upgraded and new products developed. When data

    mining tools are implemented on high performance parallel processing

    systems, they can analyze massive databases in minutes. Faster processing

    means that users can automatically experiment with more models to

    understand complex data. High speed makes it practical for users to analyze

    huge quantities of data. Larger databases, in turn, yield improved predictions.

    1.3 TECHNIQUES OF DATA MININThe most commonly used techniques in data mining are:

    Artificial neural networks: Non-linear predictive models that learn

    through training and resemble biological neural networks in structure.

    Decision trees: Tree-shaped structures that represent sets of decisions.

    These decisions generate rules for the classification of a dataset. Specific

    decision tree methods include Classification and Regression Trees

    (CART) and Chi Square Automatic Interaction Detection (CHAID).

    Genetic algorithms: Optimization techniques that use process such as

    genetic combination, mutation, and natural selection in a design based

    on the concepts of evolution.

    Nearest neighbor method: A technique that classifies each record in a

    dataset based on a combination of the classes of the k record(s) most

    similar to it in a historical dataset (where k 1). Sometimes called the k-

    nearest neighbor technique.

    Rule induction: The extraction of useful if-then rules from data based

    on statistical significance.

    DATA MINING: Applications and Trends 6

  • 8/7/2019 PROJECT ISM

    7/17

    Many of these technologies have been in use for more than a decade in

    specialized analysis tools that work with relatively small volumes of data.

    These capabilities are now evolving to integrate directly with industry-

    standard data warehouse and OLAP platforms.

    1.4 THE TEN STEPS OF DATA MINING

    Here is a process for extracting hidden knowledge from your data warehouse,

    your customer information file, or any other company database.

    1. Identify The Objective -- Before you begin, be clear on what you hope

    to accomplish with your analysis. Know in advance the business goal of the

    data mining. Establish whether or not the goal is measurable. Some possible

    goals are to

    Find sales relationships between specific products or services

    Identify specific purchasing patterns over time

    Identify potential types of customers

    Find product sales trends.

    2. Select The Data -- Once you have defined your goal, your next step isto select the data to meet this goal. This may be a subset of your data

    warehouse or a data mart that contains specific product information. It may

    be your customer information file. Segment it as much as possible the

    scope of the data to be mined. Here are some key issues.

    Are the data adequate to describe the phenomena the data mining

    analysis is attempting to model?

    Can you enhance internal customer records with external lifestyle and

    demographic data?

    Are the data stablewill the mined attributes be the same after the analysis?

    If you are merging databases can you find a common field for linking them?

    How current and relevant are the data to the business goal?

    3.Prepare The Data -- Once you've assembled the data, you mustdecide which attributes to convert into usable formats. Consider the input of

    domain expertscreators and users of the data.

    Establish strategies for handling missing data, extraneous noise,

    and outliers.

    DATA MINING: Applications and Trends 7

  • 8/7/2019 PROJECT ISM

    8/17

    Identify the Objective

    2. Select the data

    Identify redundant variables in the dataset and decide which

    fields to exclude.

    Decide on a log or square transformation, if necessary.

    Visually inspect the dataset to get a feel for the database.

    Determine the distribution frequencies of the data

    You can postpone some of these decisions until you select a data-mining

    tool. For example, if you need a neural network or polynomial network you

    may have to transform some of your fields.

    4. Audit The Data -- Evaluate the structure of your data in order todetermine the appropriate tools.

    What is the ratio of categorical/binary attributes in the database?

    What is the nature and structure of the database? What is the overall condition of the dataset?

    What is the distribution of the dataset?

    Balance the objective assessment of the structure of your data against your

    users' need to understand the findings. Neural nets, for example, don't explain

    their results.

    DATA MINING: Applications and Trends 8

  • 8/7/2019 PROJECT ISM

    9/17

    3. Prepare the data

    5. Select the Tools

    4. Audit the data

    6. Format the solution

    7. Construct the solution

    9. Deliver the findings

    8. Validate the findings

    10. Integrate the solution

    5.Select The Tools Two concerns drive the selection of theappropriate data-mining toolyour business objectives and your data

    structure. Both should guide you to the same tool. Consider these questions

    when evaluating a set of potential tools.

    Is the data set heavily categorical?

    What platforms do your candidate tools support?

    Are the candidate tools ODBC-compliant?

    DATA MINING: Applications and Trends 9

    Steps of DATA

    MINING

  • 8/7/2019 PROJECT ISM

    10/17

    What data format can the tools import?

    No single tool is likely to provide the answer to your data-mining project.

    Some tools integrate several technologies into a suite of statistical analysis

    programs, a neural network, and a symbolic classifier.

    6. Format The Solution -- In conjunction with your data audit, yourbusiness objective and the selection of your tool determine the format of your

    solution. The Key questions are:

    What is the optimum format of the solutiondecision tree, rules, C

    code, SQL syntax?

    What are the available format options?

    What is the goal of the solution?

    What do the end-users needgraphs, reports, code?

    7. Construct The Model -- At this point that the data miningprocess begins. Usually the first step is to use a random number seed to split

    the data into a training set and a test set and construct and evaluate a model.

    The generation of classification rules, decision trees, clustering sub-groups,

    scores, code, weights and evaluation data/error rates takes place at this stage.

    Resolve these issues:

    Are error rates at acceptable levels? Can you improve them?

    What extraneous attributes did you find? Can you purge them?

    Is additional data or a different methodology necessary?

    Will you have to train and test a new data set?

    8. Validate The Findings -- Share and discuss the results of theanalysis with the business client or domain expert. Ensure that the findings

    are correct and appropriate to the business objectives.

    Do the findings make sense?

    Do you have to return to any prior steps to improve results?

    Can use other data mining tools to replicate the findings?

    9. Deliver The Findings -- Provide a final report to the business unitor client. The report should document the entire data mining process

    including data preparation, tools used, test results, source code, and rules.

    Some of the issues are:

    DATA MINING: Applications and Trends 10

  • 8/7/2019 PROJECT ISM

    11/17

    Will additional data improve the analysis?

    What strategic insight did you discover and how is it applicable?

    What proposals can result from the data mining analysis?

    Do the findings meet the business objective?

    10. Integrate The Solution -- Share the findings with all interestedend-users in the appropriate business units. You might wind up incorporating

    the results of the analysis into the company's business procedures. Some of

    the data mining solutions may involve

    SQL syntax for distribution to end-users

    C code incorporated into a production system

    Rules integrated into a decision support system.

    Although data mining tools automate database analysis, they can lead to faulty

    findings and erroneous conclusions if you're not careful. Bear in mind that data

    mining is a business process with a specific goalto extract a competitive

    insight from historical records in a database.

    2.DATA MINING IMPACT ON EMPLOYEES AND

    INDUSTRY

    For Financial data analysis

    Most banks and financial institutions offer a wide variety of banking services

    (such as checking, saving, and business and individual customer transactions),

    credit (such as business, mortgage, and automobile loans), and investment

    services (such as mutual funds). Some also offer insurance services and stock

    services. Financial data collected in the banking and financial industry is often

    relatively complete, reliable and high quality, which facilitates systematic data

    analysis and data mining. For example it can also help in fraud detection by

    detecting a group of people who stage accidents to collect on insurance money.

    For Retail Industry

    Retail industry collects huge amount of data on sales, customer shopping

    history, goods transportation and Consumption and service records and so on.

    The quantity of data collected continues to expand rapidly, especially due to the

    increasing ease, availability and popularity of the business conducted on web, or

    DATA MINING: Applications and Trends 11

  • 8/7/2019 PROJECT ISM

    12/17

    e-commerce. Retail industry provides a rich source for data mining. Retail data

    mining can help identify customer behavior, discover customer shopping

    patterns and trends, improve the quality of customer service, achieve better

    customer retention and satisfaction, enhance goods consumption ratios design

    more effective goods transportation and distribution policies and reduce the cost

    of business.

    For Telecommunication Industry

    The telecommunication industry has quickly evolved from offering local and

    long distance telephone services to provide many other comprehensive

    communication services including voice, fax, pager, cellular phone, images, e-

    mail, computer and web data transmission and other data traffic. The integration

    of telecommunication, computer network, Internet and numerous other means of

    communication and computing are underway. Moreover, with the deregulationof the telecommunication industry in many countries and the development of

    new computer and communication technologies, the telecommunication market

    is rapidly expanding and highly competitive. This creates a great demand from

    data mining in order to help understand business involved, identify

    telecommunication patterns, catch fraudulent activities, make better use of

    resources, and improve the quality of services.

    Text Mining and Web Mining

    Text mining is the process of searching large volumes of documents from

    certain keywords or key phrases. By searching literally thousands of documents

    various relationships between the documents can be established. Using text

    mining however, we can easily derive certain patterns in the comments that may

    help identify a common set of customer perceptions not captured by the other

    survey questions. An extension of text mining is web mining. Web mining is an

    exciting new field that integrates data and text mining within a website. It

    enhances the web site with intelligent behavior, such as suggesting related links

    or recommending new products to the consumer. Web mining is especially

    exciting because it enables tasks that were previously difficult to implement.They can be configured to monitor and gather data from a wide variety of

    locations and can analyze the data across one or multiple sites. For example the

    search engines work on the principle of data mining.

    Healthcare

    DATA MINING: Applications and Trends 12

  • 8/7/2019 PROJECT ISM

    13/17

    The past decade has seen an explosive growth in biomedical research, ranging

    from the development of new pharmaceuticals and in cancer therapies to the

    identification and study of human genome by discovering large scale

    sequencing patterns and gene functions. Recent research in DNA analysis has

    led to the discovery of genetic causes for many diseases and disabilities as well

    as approaches for disease diagnosis, prevention and treatment.

    3.EXAMINATION OF CURRENT TRENDS

    As different types of data are available, approaches poses many challenging

    research issues in data mining. The design of a standard data mining languages,

    the development of effective and efficient data mining methods and systems, the

    construction of interactive and integrated data mining environments, and the

    applications of data mining to solve large applications large application

    problems are important tasks for data mining researches and data mining system

    and application developers. Here we will discuss some of the trends in data

    mining that reflect the pursuit of these challenges:

    Application Exploration:

    Earlier data mining was mainly used for helping businesses gain a competitive

    edge. But as data mining is becoming more popular it is gaining wide

    acceptance in other fields also such as biomedicine, stock market, fraud

    detection, telecommunication and many more. And many new explorations arebeing done for this purpose. In addition for data mining for business continues

    to expand as e-commerce and marketing becomes mainstream elements of the

    retail industry. As generic data mining systems may have limitations in dealing

    with application-specific problems, we may see a trend toward the development

    of more application specific data mining systems.

    Scalable data mining methods

    The current data mining methods capable of handling only a particular type ofdata and limited amount of data, but as data is expanding at a massive rate, there

    is a need to develop new data mining methods which are scalable and can

    handle different types of data and large volume of data. The data mining

    methods should be more interactive and user friendly. One important direction

    towards improving the repair efficiency of the timing process while increasing

    user interaction is constraint-based mining. This provide user with more control

    by allowing the specification and use of constraints to guide data mining

    systems in their search for interesting patterns.

    DATA MINING: Applications and Trends 13

  • 8/7/2019 PROJECT ISM

    14/17

    Combination of data mining with database

    systems, data warehouse systems, and web database

    systems

    Database systems, data warehouse systems, and WWW are loaded with huge

    amounts of data and have thus become the major information processing

    systems. It is important to make sure that data mining serves as essential data

    analysis component that can be easily included in to such an information-

    processing environment. The desired architecture for data mining system is the

    tight coupling with database and data warehouse systems. Transaction

    management query processing, online analytical processing and online

    analytical mining should be integrated into one unified framework.

    Standardization of data mining language:

    Today few data mining languages are commercially available in the market like

    Microsofts SQL server 2005, IBM Intelligent Miner, SAS Enterprise Miner,

    SGI Mineset, Clementine, DBMiner and many more but a standard data mining

    language or other standardization efforts will provide the orderly development

    of data mining solutions, improved interpretability among multiple data mining

    systems and functions.

    Visual data mining

    It is rightly said a picture is worth a thousand words. So if the result of the

    mined data can be shown in the visual form it will further enhance the worth of

    the mined data. Visual data mining is an effective way to discover knowledge

    from huge amounts of data. The systematic study and development of visual

    data mining techniques will promote the use for data mining analysis.

    New methods for mining complex types of data

    The complex types of data like geospatial, multimedia, time series, sequence

    and text data poses an important research area in field of data mining. There is

    still a huge gap between the needs for these applications and the available

    technology.

    Web mining

    DATA MINING: Applications and Trends 14

  • 8/7/2019 PROJECT ISM

    15/17

    The World Wide Web is huge collection of globally distributed collection of

    news, advertisements, consumer records, financial, education, government, e-

    commerce and many other services. The WWW also contains huge and dynamic

    collection hyper linked information, providing a huge source for data mining.

    Based on the above facts, the

    Web also poses great challenges for efficient resource and knowledge discovery.

    Biological data mining:

    Although biological data mining can be considered under application

    exploration, the unique combination of complexity, richness, size, and

    importance of biological warrants special attention in data mining. Mining DNA

    and protein sequences, mining high-dimensional microarray data are some of

    the interesting topics for biological data mining research.

    Data mining and software engineering:

    As software programs become increasingly bulky in size, sophisticated in

    complexity, and tend to originate from the integration of multiple components

    developed by different software team, it is an increasingly challenging task to

    ensure software robustness and reliability. The analysis of the executions of a

    buggy software program is essentially a data mining process- tracing the data

    generated during program executions may disclose important patterns andoutliers that may lead to the eventual automated discovery of software bugs.

    Distributed data mining :

    Traditional data mining methods, designed to work at a centralized location, do

    not work well in many of the distributed computing environments present today

    (e.g., intranets, Internets, LAN). Advances in distributed data mining methods

    are expected.

    Real time data mining :

    Many applications involving stream data (such as e-commerce, web mining,

    stock analysis) require dynamic data mining models to be built in real time.

    Additional development is needed in this area

    DATA MINING: Applications and Trends 15

  • 8/7/2019 PROJECT ISM

    16/17

    4. CONCLUSION

    Comprehensive data warehouses that integrate operational data with customer,

    supplier, and market information have resulted in an explosion of information.

    Competition requires timely and sophisticated analysis on an integrated view of

    the data. However, there is a growing gap between more powerful storage and

    retrieval systems and the users ability to effectively analyze and act on the

    information they contain.

    Both relational and OLAP technologies have tremendous capabilities for

    navigating massive data warehouses, but brute force navigation of data is notenough. A new technological leap is needed to structure and prioritize

    information for specific end-user problems. The data mining tools can make this

    leap. Quantifiable business benefits have been proven through the integration of

    data mining with current information systems, and new products are on the

    horizon that will bring this integration to an even wider audience of users.

    DATA MINING: Applications and Trends 16

  • 8/7/2019 PROJECT ISM

    17/17

    Since data mining is a young discipline with wide and diverse applications,

    there is still a nontrivial gap between general principles of data mining and

    domain specific, effective data mining tools for particular applications.

    A few application domains of Data Mining (such as finance, the retail

    industry and telecommunication) and Trends in Data Mining which include

    further efforts towards the exploration of new application areas and new

    methods for handling complex data types, algorithms scalability, constraint

    based mining and visualization methods, the integration of data mining

    with data warehousing and database systems, the standardization of data

    mining languages, and data privacy protection and security.

    5.REFERENCES

    Han, Jiawei and Kamber, Micheline (Second Edition). Data Mining:

    Concepts and Techniques. Morgan Kaufmann Publishers.

    http://www.wikepedia.com

    http://www.google.com

    http://www.csse.monash.edu.au

    DATA MINING: Applications and Trends 17

    http://www.wikepedia.com/http://www.wikepedia.com/http://www.google.com/http://www.google.com/http://www.csse.monash.edu.au/http://www.csse.monash.edu.au/http://www.wikepedia.com/http://www.google.com/http://www.csse.monash.edu.au/