Transcript
Page 1: Burton D. Morgan Entrepreneurial Competition Are you the entrepreneurial type? Do you want to start your own business and be your own boss? Do you have

Burton D. Morgan Entrepreneurial Competition

•Are you the entrepreneurial type? Do you want to start your own business and be your own boss? Do you have an idea for a new business? Then join us. We will help you:

• formulate your ideas • create a business plan• get feedback and possibly funding from nationally

known entrepreneurs and venture capitalists• get seed funding for your business in the form of

prizes totaling at least $50,000 (possibly more)• get space in the Purdue Technology Incubator

•The competition is open to all Purdue students. •Callouts on the 5th and 6th September, 7-9 pm in Krannert Auditorium.Register with [email protected] or call 4-7324More information at www.mgmt.purdue.edu/entrep

Page 2: Burton D. Morgan Entrepreneurial Competition Are you the entrepreneurial type? Do you want to start your own business and be your own boss? Do you have

CS 590M Fall 2001: Security Issues in Data Mining

Lecture 6: Time Series, Regression, Data Mining Process

Page 3: Burton D. Morgan Entrepreneurial Competition Are you the entrepreneurial type? Do you want to start your own business and be your own boss? Do you have

Regression

• Problem: Prediction of Numerical Values– Similar to Classification, but continuous

class

• Strong Statistical base

• Data mining community primarily concerned with scale

Page 4: Burton D. Morgan Entrepreneurial Competition Are you the entrepreneurial type? Do you want to start your own business and be your own boss? Do you have

Regression: Problem Definition

• Data: Sequence of vectors xi, yi, i=1,…,n

• Goal: Find function f such that f(x)y for– Training data xi, yi– x, y where y is unknown

• Note that f captures relationship between x and y, but doesn’t imply causality

Page 5: Burton D. Morgan Entrepreneurial Competition Are you the entrepreneurial type? Do you want to start your own business and be your own boss? Do you have

Regression: Issues

• Curse of dimensionality: As the number of attributes/values grows,– Space of possible functions f grows

exponentially– Number of training examples needed to

learn best f grows exponentially

• Solution: Constrain space of possible functions

Page 6: Burton D. Morgan Entrepreneurial Competition Are you the entrepreneurial type? Do you want to start your own business and be your own boss? Do you have

Regression: Approaches

• Decision Trees• Regression Trees (e.g., CART)

– Decision tree with automatic selection of number of choices at each node

• Regression Splines (e.g., MARS)– Handles discontinuity at choice points

• Artificial Neural Networks– Capable of computing arbitrarily complex functions

Page 7: Burton D. Morgan Entrepreneurial Competition Are you the entrepreneurial type? Do you want to start your own business and be your own boss? Do you have

Time Series

• Time/value data– Not sequential associations – value@time,

not event@time– Generally viewed as a function with a value

at any given time

• Goals:– Learn function– Identify repeated patterns of value change

Page 8: Burton D. Morgan Entrepreneurial Competition Are you the entrepreneurial type? Do you want to start your own business and be your own boss? Do you have

Time Series: Finding Patterns

• Given a values over a time fragment, find time fragments with similar values given:– Shift of values

– Scaling of values

– Stretching of time

• Find commonly occurring patterns of values (e.g., the time fragments that would give the most similar fragments under the above conditions)

Page 9: Burton D. Morgan Entrepreneurial Competition Are you the entrepreneurial type? Do you want to start your own business and be your own boss? Do you have

Time Series: Approaches

• Transformation– Use DFT to transform to frequency domain– Drop all but first few frequencies– Index in R* tree and search

• Window-based– Sliding window across sequence– Index key features in special data structure– Count entries at each index point

Page 10: Burton D. Morgan Entrepreneurial Competition Are you the entrepreneurial type? Do you want to start your own business and be your own boss? Do you have

Data Mining Process

• Cross-Industry Standard Process for Data Mining (CRISP-DM)

• European Community funded effort to develop framework for data mining tasks

• Goals:– Encourage interoperable tools across entire data

mining process

– Take the mystery/high-priced expertise out of simple data mining tasks

Page 11: Burton D. Morgan Entrepreneurial Competition Are you the entrepreneurial type? Do you want to start your own business and be your own boss? Do you have

CRISP-DM: Overview

Page 12: Burton D. Morgan Entrepreneurial Competition Are you the entrepreneurial type? Do you want to start your own business and be your own boss? Do you have

CRISP-DM: Phases

• Business Understanding– Understanding project objectives and requirements– Data mining problem definition

• Data Understanding– Initial data collection and familiarization– Identify data quality issues– Initial, obvious results

• Data Preparation– Record and attribute selection– Data cleansing

• Modeling– Run the data mining tools

• Evaluation– Determine if results meet business objectives– Identify business issues that should have been addressed earlier

• Deployment– Put the resulting models into practice– Set up for repeated/continuous mining of the data


Recommended