Burton D. Morgan Entrepreneurial Competition Are you the entrepreneurial type? Do you want to start your own business and be your own boss? Do you have

  • Published on

  • View

  • Download

Embed Size (px)


<ul><li> Slide 1 </li> <li> Burton D. Morgan Entrepreneurial Competition Are you the entrepreneurial type? Do you want to start your own business and be your own boss? Do you have an idea for a new business? Then join us. We will help you: formulate your ideas create a business plan get feedback and possibly funding from nationally known entrepreneurs and venture capitalists get seed funding for your business in the form of prizes totaling at least $50,000 (possibly more) get space in the Purdue Technology Incubator The competition is open to all Purdue students. Callouts on the 5 th and 6 th September, 7-9 pm in Krannert Auditorium. Register with paf@purdue.edu or call 4-7324paf@purdue.edu More information at www.mgmt.purdue.edu/entrep </li> <li> Slide 2 </li> <li> CS 590M Fall 2001: Security Issues in Data Mining Lecture 6: Time Series, Regression, Data Mining Process </li> <li> Slide 3 </li> <li> Regression Problem: Prediction of Numerical Values Similar to Classification, but continuous class Strong Statistical base Data mining community primarily concerned with scale </li> <li> Slide 4 </li> <li> Regression: Problem Definition Data: Sequence of vectors x i, y i, i=1,,n Goal: Find function f such that f(x) y for Training data x i, y i x, y where y is unknown Note that f captures relationship between x and y, but doesnt imply causality </li> <li> Slide 5 </li> <li> Regression: Issues Curse of dimensionality: As the number of attributes/values grows, Space of possible functions f grows exponentially Number of training examples needed to learn best f grows exponentially Solution: Constrain space of possible functions </li> <li> Slide 6 </li> <li> Regression: Approaches Decision Trees Regression Trees (e.g., CART) Decision tree with automatic selection of number of choices at each node Regression Splines (e.g., MARS) Handles discontinuity at choice points Artificial Neural Networks Capable of computing arbitrarily complex functions </li> <li> Slide 7 </li> <li> Time Series Time/value data Not sequential associations value@time, not event@time Generally viewed as a function with a value at any given time Goals: Learn function Identify repeated patterns of value change </li> <li> Slide 8 </li> <li> Time Series: Finding Patterns Given a values over a time fragment, find time fragments with similar values given: Shift of values Scaling of values Stretching of time Find commonly occurring patterns of values (e.g., the time fragments that would give the most similar fragments under the above conditions) </li> <li> Slide 9 </li> <li> Time Series: Approaches Transformation Use DFT to transform to frequency domain Drop all but first few frequencies Index in R* tree and search Window-based Sliding window across sequence Index key features in special data structure Count entries at each index point </li> <li> Slide 10 </li> <li> Data Mining Process Cross-Industry Standard Process for Data Mining (CRISP-DM) European Community funded effort to develop framework for data mining tasks Goals: Encourage interoperable tools across entire data mining process Take the mystery/high-priced expertise out of simple data mining tasks </li> <li> Slide 11 </li> <li> CRISP-DM: Overview </li> <li> Slide 12 </li> <li> CRISP-DM: Phases Business Understanding Understanding project objectives and requirements Data mining problem definition Data Understanding Initial data collection and familiarization Identify data quality issues Initial, obvious results Data Preparation Record and attribute selection Data cleansing Modeling Run the data mining tools Evaluation Determine if results meet business objectives Identify business issues that should have been addressed earlier Deployment Put the resulting models into practice Set up for repeated/continuous mining of the data </li> </ul>