17
Top (10) challenging problems in data mining Supervised by: Dr. Ali Haroun Prepared by : Ahmed Ramzi Rashid Ahmed Sedeeq Baker Master 2017-3-11

Top (10) challenging problems in data mining

Embed Size (px)

Citation preview

Page 1: Top (10) challenging problems  in data mining

Top (10) challenging problems in data mining

Supervised by:Dr. Ali Haroun

Prepared by :Ahmed Ramzi Rashid Ahmed Sedeeq Baker

Master 2017-3-11

Page 2: Top (10) challenging problems  in data mining

Suggestions

Outlines :

2

Introduction

Top 10 challenging Problems in data mining

Conclusions

Page 3: Top (10) challenging problems  in data mining

Introduction (1-1) :

Data mining is sorting through data to identify patterns and establish relat-ionships. Data mining parameters

include : - Association; - Sequence or path analysis; - Classification; - Clustering; - Forecasting.

Page 4: Top (10) challenging problems  in data mining

Introduction (1-2) :

4

Data is Very

complex

So we have top 10 challengingProblems in data mining

There is a different

Way to extract The information

A huge amount of data

Data is power

Manyalgorithms

Page 5: Top (10) challenging problems  in data mining

- Top 10 challenging Problems in data mining (DM) :1- Developing a Unifying Theory of Data Mining :

The developers could not have a structure that contains the different datamining algorithms .

Knowledge To be

verified

Types of dataset Selection criterion Unified (DM) process

Numeric

Categorical

Multimedia

Text

Akaike information

criterion

Clustering

Classification

Association

Page 6: Top (10) challenging problems  in data mining

- Top 10 challenging Problems in data mining (DM) :2- Scaling Up for High Dimensional Data and High Speed Data Streams :

The problem begins when the data becomes

huge and complex

we need ultra-high dimensional classification

problems (millions or billions

of features )

Rather than we need

Ultra-high speed data

stream

Page 7: Top (10) challenging problems  in data mining

• In this problem we want to see how to efficiently and predict the direction of these data .

• In any design we must take care of this three master steps:

7

Practicaldesign

Predictor

Information

Learner

(1) QIANG YANG ,10 Challenging problems in data mining research , International Journal of Information Technology & Decision Making , Vol. 5, No. 4 (2006) 597–604 .

- Top 10 challenging Problems in data mining (DM) :3- Mining Sequence Data and Time Series Data :

Page 8: Top (10) challenging problems  in data mining

• We have complex knowledge when we have mining data from multiple relation.

• In most domains, the object of interest are not independent of each other.

• The objects are not of a single type.

8

HTML has a tree structure(nested tags)

Text has a list structure(sequence of words)

Hyperlinks graph structure(Linked pages)

Example domains

Worldwide Web

(1) Jarosław Stepaniuk , Rough – Granular in Knowledge Discovery and Data Mining , Volume 152 of the series , pp 99-110 .

- Top 10 challenging Problems in data mining (DM) : 4. Mining Complex Knowledge from Complex Data :

Page 9: Top (10) challenging problems  in data mining

5.1 : Community and social networks :• when we say community we must

take important topics that are mining of social networks .

• The challenging to identify the problem is : It’s critical . Distributed . Snapshot .

9

5.2 : Mining in and for computer networks — high-speed mining of high-speed streams :• This part studies how to provide

a Good algorithm are and how to detecte an attack .

• DoS (Denial of Service) how to detected it and how to discriminate .

We will discuss two part in this problem:

(1) Qiang Yang, Hong Kong , 10 Challenging Problems in Data Mining Research , ICDM 2005 , pp 8.

- Top 10 challenging Problems in data mining (DM) :5. Data Mining in a Network Setting :

Page 10: Top (10) challenging problems  in data mining

• Need to correlate the data seen at the various probes (such as in a sensor network).

• The important problem is how to mine across multiple heterogeneous data sources.

• The goal is to minimize the amount of data shipped between the various sites, by combining data mining with game theory.

10

(1) Rao , Dr. S Vidyavathi , Distributed data mining and mining multi – agent data , International Journal on Computer Science and Engineering ,Vol. 02, No. 04, 2010, 1237-1244 .

- Top 10 challenging Problems in data mining (DM) :6. Distributed Data Mining and Mining Multi-Agent Data :

Page 11: Top (10) challenging problems  in data mining

11

• The world today is “resource-driven”.• So how we could have a best understand and

hence utilize about our environment .• The researchers try to solve these problems :

- Bioinformatics . - Spatial data .- Earthquakes . - Land slide .- Biological sequence . - Cancer prediction .

() Pooja Shrivastava & Dr. Manoj Shukla , A Brief Survey On Data mining For Biological and Environmental Problems , International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 , pp630-631 .

- Top 10 challenging Problems in data mining (DM) : 7. Data Mining for Biological and Environmental Problems :

Page 12: Top (10) challenging problems  in data mining

Data cleaning

• how to merge visual

interactive and automatic (DM)

techniques together.

12

• how to perform systematic

documentation of data cleaning .

•Help users to avoid mistakes in (DM).•Create a methodology

in (DM) .

() QiangYang , 10 Challenging Problems in Data Mining Research , ICDM 2005 , pp 11 .

- Top 10 challenging Problems in data mining (DM) : 8. Data Mining Process-Related Problems :

Automate(DM)

operations

Combine techniques

Page 13: Top (10) challenging problems  in data mining

13

Knowledge integrity challenges

Knowledge integrity challenges

The challenges facing researchers

Data are being mined

Develop efficient algorithm to compare (before & after) knowledge contents .Not just evaluates the knowledge integrityBut also measures to evaluate the knowledge integrity of individual patterns.

How to mined the data withEnsure the user’s privacy

Develop algorithms for estimating the impact of the data.

() QIANG YANG , 10 CHALLENGING PROBLEMS IN DATA MINING RESEARCH , International Journal of Information Technology & Decision Making Vol. 5, No. 4 (2006) , pp603.

- Top 10 challenging Problems in data mining (DM) : 9. Security, Privacy, and Data Integrity :

Page 14: Top (10) challenging problems  in data mining

14

Sampling

Correct the bias

Deal with special data

Sampling and model building are not optimal .

Here is the problem that how to correct the bias as we can.

Deal with unbalanced and cost – sensitive data .

Obtaining these costs relied on sampling method .

() QIANG YANG , 10 CHALLENGING PROBLEMS IN DATA MINING RESEARCH , International Journal of Information Technology & Decision Making Vol. 5, No. 4 (2006) , pp 603-604 .

- Top 10 challenging Problems in data mining (DM) : 10. Dealing with Non-Static, Unbalanced and Cost-Sensitive Data:

Page 15: Top (10) challenging problems  in data mining

Conclusions :

• The presentation highlights on the most important 10 problems in data mining but in concise manner .

• The order of the sequence list does not reflect their level of important .

15

Page 16: Top (10) challenging problems  in data mining

• We must try to work hard to overcome these problems , because nowadays the one who owns the information he has the power .

16

Suggestions :

Page 17: Top (10) challenging problems  in data mining

17