Burton D. Morgan Entrepreneurial Competition
•Are you the entrepreneurial type? Do you want to start your own business and be your own boss? Do you have an idea for a new business? Then join us. We will help you:
• formulate your ideas • create a business plan• get feedback and possibly funding from nationally
known entrepreneurs and venture capitalists• get seed funding for your business in the form of
prizes totaling at least $50,000 (possibly more)• get space in the Purdue Technology Incubator
•The competition is open to all Purdue students. •Callouts on the 5th and 6th September, 7-9 pm in Krannert Auditorium.Register with [email protected] or call 4-7324More information at www.mgmt.purdue.edu/entrep
CS 590M Fall 2001: Security Issues in Data Mining
Lecture 6: Time Series, Regression, Data Mining Process
Regression
• Problem: Prediction of Numerical Values– Similar to Classification, but continuous
class
• Strong Statistical base
• Data mining community primarily concerned with scale
Regression: Problem Definition
• Data: Sequence of vectors xi, yi, i=1,…,n
• Goal: Find function f such that f(x)y for– Training data xi, yi– x, y where y is unknown
• Note that f captures relationship between x and y, but doesn’t imply causality
Regression: Issues
• Curse of dimensionality: As the number of attributes/values grows,– Space of possible functions f grows
exponentially– Number of training examples needed to
learn best f grows exponentially
• Solution: Constrain space of possible functions
Regression: Approaches
• Decision Trees• Regression Trees (e.g., CART)
– Decision tree with automatic selection of number of choices at each node
• Regression Splines (e.g., MARS)– Handles discontinuity at choice points
• Artificial Neural Networks– Capable of computing arbitrarily complex functions
Time Series
• Time/value data– Not sequential associations – value@time,
not event@time– Generally viewed as a function with a value
at any given time
• Goals:– Learn function– Identify repeated patterns of value change
Time Series: Finding Patterns
• Given a values over a time fragment, find time fragments with similar values given:– Shift of values
– Scaling of values
– Stretching of time
• Find commonly occurring patterns of values (e.g., the time fragments that would give the most similar fragments under the above conditions)
Time Series: Approaches
• Transformation– Use DFT to transform to frequency domain– Drop all but first few frequencies– Index in R* tree and search
• Window-based– Sliding window across sequence– Index key features in special data structure– Count entries at each index point
Data Mining Process
• Cross-Industry Standard Process for Data Mining (CRISP-DM)
• European Community funded effort to develop framework for data mining tasks
• Goals:– Encourage interoperable tools across entire data
mining process
– Take the mystery/high-priced expertise out of simple data mining tasks
CRISP-DM: Overview
CRISP-DM: Phases
• Business Understanding– Understanding project objectives and requirements– Data mining problem definition
• Data Understanding– Initial data collection and familiarization– Identify data quality issues– Initial, obvious results
• Data Preparation– Record and attribute selection– Data cleansing
• Modeling– Run the data mining tools
• Evaluation– Determine if results meet business objectives– Identify business issues that should have been addressed earlier
• Deployment– Put the resulting models into practice– Set up for repeated/continuous mining of the data