23
Intelligent Internet Agents for Distributed Data Mining {yzhang, sowen, sprasad, raj}@cs.gsu.edu [email protected] Yanqing Zhang, Scott Owen, Sushil Prasad and Raj Sunderraman Department of Computer Science Georgia State University George Vachtsevanos School of Electrical and Computer Engineering Georgia Institute of Technology

Intelligent Internet Agents for Distributed Data Mining {yzhang, sowen, sprasad, raj}@cs.gsu.edu [email protected] Yanqing Zhang, Scott Owen, Sushil Prasad

Embed Size (px)

Citation preview

Page 1: Intelligent Internet Agents for Distributed Data Mining {yzhang, sowen, sprasad, raj}@cs.gsu.edu gjv@ece.gatech.edu Yanqing Zhang, Scott Owen, Sushil Prasad

Intelligent Internet Agents for Distributed Data Mining

{yzhang, sowen, sprasad, raj}@[email protected]

Yanqing Zhang, Scott Owen, Sushil Prasad and Raj Sunderraman

Department of Computer Science

Georgia State University

George Vachtsevanos

School of Electrical and Computer Engineering

Georgia Institute of Technology

Page 2: Intelligent Internet Agents for Distributed Data Mining {yzhang, sowen, sprasad, raj}@cs.gsu.edu gjv@ece.gatech.edu Yanqing Zhang, Scott Owen, Sushil Prasad

Outline• Motivation

• Architecture of Intelligent Internet Agents

• Program Libraries of Intelligent Middleware

• Smart Web Search Agents

• Intelligent Soft Computing Agents

• Benefits

• Deliverables

• Conclusion

Page 3: Intelligent Internet Agents for Distributed Data Mining {yzhang, sowen, sprasad, raj}@cs.gsu.edu gjv@ece.gatech.edu Yanqing Zhang, Scott Owen, Sushil Prasad

Motivation• Distributed Web KDD: Useful information and

knowledge mined in distributed Web databases

• QoS (Efficiency, Web Speed, User Time) : Huge amounts of useless data flow on the Internet

• From Data Web to Information Web: Upgrade a current data-flow-oriented Internet to a future information-flow-oriented Internet

• Intelligent Web Middleware: with reusable, portable and scalable intelligent functionality

• Smart E-Business: Use intelligent Web agents to do better E-Business on the Internet

Page 4: Intelligent Internet Agents for Distributed Data Mining {yzhang, sowen, sprasad, raj}@cs.gsu.edu gjv@ece.gatech.edu Yanqing Zhang, Scott Owen, Sushil Prasad

Architecture of Intelligent Internet Agents

Application Layer: E-Commerce, E-Education, other E-B

Intelligent Layer: Data Mining, Soft Computing, ES, etc

Network Layer: Backbone, gigaPoPs, other hardware

Page 5: Intelligent Internet Agents for Distributed Data Mining {yzhang, sowen, sprasad, raj}@cs.gsu.edu gjv@ece.gatech.edu Yanqing Zhang, Scott Owen, Sushil Prasad

Program Libraries of Intelligent Middleware1. Binary Association Rule Generator2. Fuzzy Association Rule Generator3. Neural-Net-based Data Classifier and Pattern Generator4. Fuzzy c-means Program for Data Clustering5. Genetic Algorithms for Data Refinement and Optimization6. Granular Neural Nets for Linguistic Data Mining7. XML-based Smart Web Search Sub-Programs8. Connection Programs between Database and Middle Layer9. Local Cache Database Manager10. Local Cache Informationbase Manager11. Basic GUI Programs12. Client-Server Creation and Communication Programs13. Distributed Operation Manager14. Distributed Data Mining Synchronization, 15. Web Customer Log Miner, .….. , and so on.

Page 6: Intelligent Internet Agents for Distributed Data Mining {yzhang, sowen, sprasad, raj}@cs.gsu.edu gjv@ece.gatech.edu Yanqing Zhang, Scott Owen, Sushil Prasad

Smart Web Search Agents• Data Search Engines >> Information Search Agents

- Traditional searching on the Web is done using one of the following three:

- Directories (Yahoo, Lycos, etc) - Search Engines (AltaVista, NorthernLight, etc) - Metasearch Engines (MetaCrawler,

SavvySearch, AskJeeves, etc) All of these involve keyword searches;

Drawback: not easily personalized, too many results (although many give

relevancy factors)

Page 7: Intelligent Internet Agents for Distributed Data Mining {yzhang, sowen, sprasad, raj}@cs.gsu.edu gjv@ece.gatech.edu Yanqing Zhang, Scott Owen, Sushil Prasad

- Smart Search Agents will provide

- more personalized searches

- domain-based search,

- more efficient searches

Page 8: Intelligent Internet Agents for Distributed Data Mining {yzhang, sowen, sprasad, raj}@cs.gsu.edu gjv@ece.gatech.edu Yanqing Zhang, Scott Owen, Sushil Prasad

Smart Search Agents will employ - local cache databases (containing

frequently asked queries/results; possibly updated periodically - nightly!)

- local cache information base (containing mined information and discovered knowledge for efficient personal use)

- domain-based agents (e.g. Job Search; Sports-NBA Stats, Bibliography-Digital Libraries)

Page 9: Intelligent Internet Agents for Distributed Data Mining {yzhang, sowen, sprasad, raj}@cs.gsu.edu gjv@ece.gatech.edu Yanqing Zhang, Scott Owen, Sushil Prasad

Some initial results:• M. Nagarajan, Metagenie - A metasearch engine for

multi-databases, M.S. thesis, GSU (July 1999) Domains: Jobs, Books• S. Ahmed, EXACT-FINDER: A cache-based meta-search

engine, M.S. thesis, GSU (May 2000) Local cache database storing personalized frequently

asked queries and results, updated periodically•  R. Sunderraman, ReQueSS: Relational Querying of semi-

structured data, ICDE 2000 (demo session), San Diego, CA, March 2000.

• X. Li, Querying unified sources of Web data, M.S. thesis, GSU (July 1999)

Data wrappers for Web sources (NBA stats/box scores, DBLP Bibliography database)

Page 10: Intelligent Internet Agents for Distributed Data Mining {yzhang, sowen, sprasad, raj}@cs.gsu.edu gjv@ece.gatech.edu Yanqing Zhang, Scott Owen, Sushil Prasad

Intelligent Tools for E-Business• Computational Intelligence, Neural Networks,

Fuzzy Logic, Genetic Algorithms, Hybrid Systems

• Learning Algorithms, Heuristic Searching

• Data Analysis and Modeling, Data Fusion and Mining, Knowledge Discovery

• Prediction & Time Series Analysis

• Information Retrieval, Intelligent User Interface

• Intelligent Agents, Distributed IA and Multi-Agents, Cooperative Knowledge-based Systems

Page 11: Intelligent Internet Agents for Distributed Data Mining {yzhang, sowen, sprasad, raj}@cs.gsu.edu gjv@ece.gatech.edu Yanqing Zhang, Scott Owen, Sushil Prasad

Enhancing E-Business Process Through Data Mining

• Quality of discovered knowledge– Having right data– Having appropriate

data mining tools!!!

D a ta M in in g( Kn o w led g e d is c o v er y )

D AT A W ar eh o u s e

D AT A W ar eh o u s e

D AT A W ar eh o u s e

F ailu r e P atte r n s

Su cces s P at t ern s

F A IL U R E P at t ern s

SU C C E SS P at t ern s

• Traditional Data Mining Tools

– Simple query and reporting

– Visualization driven data exploration tools, OLAP

– Discovery process is user driven

Page 12: Intelligent Internet Agents for Distributed Data Mining {yzhang, sowen, sprasad, raj}@cs.gsu.edu gjv@ece.gatech.edu Yanqing Zhang, Scott Owen, Sushil Prasad

Intelligent Data Mining Tools

• Automate the process of discovering patterns/knowledge in data

• Require hypothesis, exploration• Derive business knowledge (patterns) from data• Combine business knowledge of users with

results of discovery algorithms

D AT A W ar eh o u s e

D AT A W ar eh o u s e

D AT A W ar eh o u s e

F ailu r e P a tte r n s

Su cces s P at t ern s

F A IL U R E P at t ern s

SU C C E SS P at t ern s

Page 13: Intelligent Internet Agents for Distributed Data Mining {yzhang, sowen, sprasad, raj}@cs.gsu.edu gjv@ece.gatech.edu Yanqing Zhang, Scott Owen, Sushil Prasad

Intelligent Information Agents

• The Data Mining Problem:– Clustering/ Classification– Association– Sequencing

• Viewed as an Optimization Problem

• Tools: Genetic Algorithms

Page 14: Intelligent Internet Agents for Distributed Data Mining {yzhang, sowen, sprasad, raj}@cs.gsu.edu gjv@ece.gatech.edu Yanqing Zhang, Scott Owen, Sushil Prasad

Fuzzy Rules Discovering• Rules discovering : The discovery of associations

between business events, i.e. which items are purchased together

• In order to do flexible querying and intelligent searching, fuzzy query is developed to uncover potential valuable knowledge

• Fuzzy Query uses fuzzy terms like tall, small, and near to define linguistic concepts and formulate a query

• Automated search for fuzzy Rules is carried out by the discovery of fuzzy clusters or segmentation in data

Page 15: Intelligent Internet Agents for Distributed Data Mining {yzhang, sowen, sprasad, raj}@cs.gsu.edu gjv@ece.gatech.edu Yanqing Zhang, Scott Owen, Sushil Prasad

Fuzzy Decision Making:Match Users with Dynamic Products, Services, and Pricing

Loss Ratio(Risk)

Response

Persistency(Retention)

Low Medium High

Lo

w

Med

ium

Hig

h

Low Medium

High

Low RiskHigh ResponseHigh Retention

->Customer: Preferred

Pricing: according to Life-time Value

Cross-Selling: BundleExtra Liability Insurance

(Risk-Response-Retention ( R ) Model)3

Example of 3 Service Provider’s Features

Page 16: Intelligent Internet Agents for Distributed Data Mining {yzhang, sowen, sprasad, raj}@cs.gsu.edu gjv@ece.gatech.edu Yanqing Zhang, Scott Owen, Sushil Prasad

Measuring Performance of Intelligent Agents

• Accuracy : distance or variance measure of IAs’ performance from their goal, i.e. Fuzzy Entropy

• Speed : latency of response

• Cost : resources consumed, consequences of failures

• Benefit : payoff for goals achieved

...BenefitwCostwSpeedwAccuracyw IAP 4321 ...BenefitwCostwSpeedwAccuracyw IAP 4321

Page 17: Intelligent Internet Agents for Distributed Data Mining {yzhang, sowen, sprasad, raj}@cs.gsu.edu gjv@ece.gatech.edu Yanqing Zhang, Scott Owen, Sushil Prasad

Performance Assessment, Learning and Optimization

D AT A W ar eh o u s e

D AT A W ar eh o u s e

D AT A W ar eh o u s e

F ailu r e P a tte r n s

Su cces s P at t ern s

F A IL U R E P at t ern s

SU C C E SS P at t ern s

Learning/Adaptation

Learning/Adaptation

Performance Evaluation Module

Performance Evaluation Module

Goals/Objectives

Goals/Objectives

Page 18: Intelligent Internet Agents for Distributed Data Mining {yzhang, sowen, sprasad, raj}@cs.gsu.edu gjv@ece.gatech.edu Yanqing Zhang, Scott Owen, Sushil Prasad

Examples• Product Information Clustering

– Use a GA as the Heuristic Search Engine– Apply the GA selection and inversion operators– Evaluate information content– Estimate system entropy– Apply reinforcement learning strategy

• Dynamic Pricing– In addition to above steps, explore association

and sequencing relations

Page 19: Intelligent Internet Agents for Distributed Data Mining {yzhang, sowen, sprasad, raj}@cs.gsu.edu gjv@ece.gatech.edu Yanqing Zhang, Scott Owen, Sushil Prasad

The “New Technology” Paradigm

InternetRelatedTechnologies

Euphoria/Optimism Reality

Back to Basics

Time

Page 20: Intelligent Internet Agents for Distributed Data Mining {yzhang, sowen, sprasad, raj}@cs.gsu.edu gjv@ece.gatech.edu Yanqing Zhang, Scott Owen, Sushil Prasad

INFORMATION IS SELLING NOW!

Intelligent Agents will give your information product bargaining power

Page 21: Intelligent Internet Agents for Distributed Data Mining {yzhang, sowen, sprasad, raj}@cs.gsu.edu gjv@ece.gatech.edu Yanqing Zhang, Scott Owen, Sushil Prasad

Benefits• Better QoS:

- Web users get information (not raw data)

- Smart agents can make decisions for users

- Smart agents can save users’ surfing time

• Faster Internet:

- Information flows on the Internet quickly (e.g., 1k information << 100 k raw data)

- Reduce data redundancy on the Internet

- Reduce Web communication congestion

Page 22: Intelligent Internet Agents for Distributed Data Mining {yzhang, sowen, sprasad, raj}@cs.gsu.edu gjv@ece.gatech.edu Yanqing Zhang, Scott Owen, Sushil Prasad

Deliverables

• Intelligent Middle Layer

- Data Mining Program Libraries

- Soft Computing Program Libraries (e.g., Neural Networks, Fuzzy Logic, Genetic Algorithms, Neuro-fuzzy Systems)

• Application Layer - Smart Web Search Agents

- Intelligent Soft Computing Agents

Page 23: Intelligent Internet Agents for Distributed Data Mining {yzhang, sowen, sprasad, raj}@cs.gsu.edu gjv@ece.gatech.edu Yanqing Zhang, Scott Owen, Sushil Prasad

Conclusion

• To make the future Internet more intelligent and more efficient, it is necessary to design relevant "Intelligent Middleware" between network hardware and high-level Web application systems.

• We will first design basic intelligent middle layer with basic intelligent functionality, and then implement two Web application systems for distributed data mining and E-Business.