Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Hybrid Soft ComputingChallenges, Perspectives and Applications
Ajith AbrahamNorwegian Center of Excellence,
Norwegian University of Science and Technology, TrondheimNorway
http://[email protected]
• Soft Computing Ingredients : Neural networks, fuzzy inference systems, evolutionary algorithms, probabilistic reasoning etc.
• Need for Hybridization?
• Engineering Hybrid Architectures
• Applications
- E-commerce (business intelligence)- Network Security- Data Mining
• Conclusions
Presentation Overview
What is Intelligence?Intelligence is what we use when we don’t know what to do.Intelligence requires an ability to
• Perform complex tasks
• Recognize complex patterns
• Solve unseen problems
• Learn from experience
• Learn from instruction
• Use Natural Language
• Be aware of self (consciousness)
• Use tools
Parking a CarGenerally, a car can beparked rather easily. It it werespecified to within, say, afraction of a millimeter, itwould take hours ofmaneuvering and precisemeasurements of distanceand angular position to solvethe problem.
⇒ High precision carries a high cost.⇒ The challenge is to exploit the tolerance for imprecision by
devising methods of computation which lead to anacceptable solution at low cost. This, in essence, is theguiding principle of modern intelligent computing.
FL : Algorithms for dealing with imprecision and uncertainty
RS: Handling uncertainty arising from the granularity in the domain of discourse
NC : Machinery for function approximation
EA/SI : Algorithms for global search and optimization
Intelligent Systems Ingredients
FL, RS, NC and EA/SI are Complementary rather than Competitive
Computational Theory of Perceptions
Humans have remarkable capability to perform a wide variety of physical and mental tasks without any measurement and computations.
Reflecting the finite ability of the sensory organs and (finally the brain) to resolve details, Perceptions are inherently imprecise.
Provides capability to compute and reason with perception based information
How to Model Perceptions
• Perceptions are both fuzzy and granular• Boundaries of perceived classes are unsharp• Values of attributes are granulated
Example:Granules in age: very young, young, not so old,…
Perceptions are described by propositions drawn from a natural language
Knowledge-basedSystems
•Fuzzy logic
•Rough sets
Pattern recognitionMachine learningData miningWeb intelligence
Hybrid Systems
NN-FLNN-EAFL-EANN-FL-EAEtc..
Non-linear Dynamics
Chaos theorySignal processingFractals
MachineIntelligence
Problem Solving Techniques
Symbolic Logic
Reasoning
Traditional Numerical
Modeling and Search
Approximate Reasoning
Functional Approximation
and Randomized Search
Conventional Hard Computing Soft Computing
Precise Models Approximate Models
Soft Computing
Soft Computing Main Components
Probabilistic Models
NeuralNetworks
Approximate Reasoning
Fuzzy Logic
EvolutionaryAlgorithms
Functional Approximation/ Randomized Search
Artificial Neural Networks
Artificial Neural Networks - Features
• Typically, structure of a neural network is established and one of a variety of mathematical algorithms is used to determine what the weights of the interconnections should be to maximize the accuracy of the outputs produced.
• This process by which the synaptic weights of a neural network are adapted according to the problem environment is popularly known as learning.
• There are broadly three types of learning: Supervised learning, unsupervised learning and reinforcement learning
Different Neural Network Architectures
Multi layered feedforward network Recurrent network
Competitive network Jordan network
Backpropagation Algorithm
Backpropagation algorithm
1)(nΔw*αδwδE*ε(n)Δw ij
ijij −+−=
• E = error criteria to be minimized• wij = weight from the i-th input unit to the j-th output• ε and α are the learning rate and momentum
Choosing Hidden NeuronsA large number of hidden neurons will ensure the correct learning and the network is able to correctly predict the data it has been trained on, but its performance on new data, its ability to generalise, is compromised.
With too few a hidden neurons, the network may be unable to learn the relationships amongst the data and the error will fail to fall below an acceptable level.
Selection of the number of hidden neurons is a crucial decision.
Often a trial and error approach is taken.
Use of Momentum
• Helps to get out of local minima• Smooth out the variations
Effects of Different Learning Rates
Effect on Number of Hidden Neurons – Mackey Glass
Lowest RMSE for LM = 0.0004(24 hidden neurons)
Effect on Number of Hidden Neurons – Mackey Glass
Lowest RMSE for LM = 0.0009(24 hidden neurons)
Effect on Number of Hidden Neurons - Gas Furnace Series
Lowest RMSE for LM = 0.009(24 hidden neurons)
Effect on Number of Hidden Neurons - Gas Furnace Series
Lowest RMSE for SCG = 0.033(16 hidden neurons)
No Free Lunch Theorem
Even though artificial neural networks are capable of performing a wide variety of tasks, yet in practice sometimes they deliver only marginal performance.
There is little reason to expect that one can find a uniformly best algorithm for selecting the weights in a feedforwardartificial neural network.
This is in accordance with the no free lunch theorem, which explains that for any algorithm, any elevated performance over one class of problems is exactly paid for in performance over another class.
Fuzzy Logic
How Fuzzy Sets are Constructed?
A = Set of Old People
Age(years)
80
1.0Crisp set A
Membership function
Age(years)
65 75
.5
.9
Fuzzy set A1.0
Construction of fuzzy set depend on two things:Identification of a suitable universe of discourse and the specification of an appropriate membership function
Example showing how a set of old people could be represented using fuzzy set and crisp set
Fuzzy if-then Rules
• Mamdani fuzzy inference systemIf pressure is high then volume is small
high small
• Takagi Sugeno fuzzy inference systemIf pressure is medium then volume = 5 x pressure
mediumvolume = 5 x pressure
Mamdani Inference System
Z = (centroid of area)
A1 B1
A2 B2
x
X
X
Y
Y
y
Z1
C2
C1
Z2
Input MF
Output MF
Input (x,y)
Output Z
Fuzzy Expert System
A fuzzy expert system to forecast the reactive power (P) at time t+1 by knowing the load current (I) and voltage (V) at time t.
The experiment system consists of two stages:
Developing the fuzzy expert system and performance evaluation using the test data.
The model has two input variables (V and I) and one output variable (P).
Training and testing data sets were extracted randomly from the master dataset. 60% of data was used for training and remaining 40% for testing.
Fuzzy Expert System - Some Illustrations
No. of MF's
Mamdani FIS Takagi - Sugeno FIS
Root Mean Squared Error
Training Test Training Test
2 0.401 0.397 0.024 0.023
3 0.348 0.334 0.017 0.016
Different quantity of Membership Functions
Mamdani FIS Takagi - Sugeno FIS
Root Mean Squared Error
Training Test Training Test
0.243 0.240 0.021 0.019
Different shape of Membership Functions
Fuzzy Expert System - Some Illustrations
Mamdani FIS Takagi - Sugeno FIS
Root Mean Squared Error
Training Test Training Test
0.221 0.219 0.019 0.018
For different fuzzy operators
Fuzzy Expert System - Some Illustrations
Mamdani FIS Takagi - Sugeno FIS
Defuzzification
RMSEDefuzzification
RMSE
Training Test Training Test
Centroid 0.221 0.0219 Weighted sum 0.019 0.018
MOM 0.230 0.232 Weighted average 0.085 0.084
BOA 0.218 0.216
SOM 0.229 0.232
For different defuzzification operators
Fuzzy Expert System - Some Illustrations
Summary of Fuzzy Modeling
Surface structure• Relevant input and output variables• Relevant fuzzy inference system• Number of linguistic terms associated with each
input / output variable• If-then rules
Deep structure• Type of membership functions• Building up the knowledge base• Fine tune parameters of MFs using regression and
optimization techniques
Evolutionary Computation
Evolutionary Algorithms
Evolution strategies
Evolutionary Algorithms
Genetic Programming
EvolutionaryProgramming
GeneticAlgorithm
•Evolutionary Algorithms can be described by
x[t + 1] = s(v(x[t]))
x[t] : the population at time t under representation x
v : is the reproduction operator (s)
s : is the selection operator
Evolutionary Algorithm – Flow Chart
1001011001100010101001001001100101111101
. . .
. . .
. . .
. . .
1001011001100010101001001001110101111001
. . .
. . .
. . .
. . .
Selection reproduction
Currentgeneration
Nextgeneration
Elitism
Evolutionary Algorithm Parameter Settings
Parametertuning
Deterministic
Parametercontrol
Adaptive
Parametersettings
During the runBefore the run
Evolutionary Algorithm Behaviour
Evolutionary algorithm behaviour is determined by the exploitation and exploration relationship kept throughout the run.Adaptive evolutionary algorithms have been built for inducing exploitation -- exploration relationships that avoid the premature convergence problem and improve the final results.If poor settings are used, the EA’s performance shall be severely affected.
Where to hybridize?
Comparison of Different Intelligent Systems†
FIS ANN EC Symbolic AI
Mathematical model SG B B SBLearning ability B G SG B
Knowledge representation G B SB G
Expert knowledge G B B GNonlinearity G G G SBOptimization ability B SG G BFault tolerance G G G BUncertainty tolerance G G G BReal time operation G SG SB B
†Fuzzy terms used for grading are good (G), slightly good (SG), slightly bad (SB) and bad (B)
Hybrid Soft Computing
Hybrid Soft Computing Architecture - 1
Soft Computing1
Soft Computing2
x1 (n) y1 (n)
x2 (n) y2 (n)
Problem
Solution
Solution
Hybrid Soft Computing Architecture - 2
Soft Computing1
Soft Computing2
x1 (n) y1 (n)
x2 (n) y2 (n)
Problem Solution
Hybrid Soft Computing Architecture - 3
Soft Computing1
Soft Computing2
x1 (n) y1 (n)
y2 (n)
Problem Solution
Δ Feedback
Hybrid Soft Computing Architecture - 4
Soft Computing1 Soft Computing2
x1 (n) y1 (n) z1 (n)Problem Solution
Hybrid Soft Computing Architecture - 5
Soft Computing1
Soft Computing2
x1 (n) z1 (n)
y1 (n)
Problem Solution
Hybrid Soft Computing Architecture - 6
Soft Computing1
Soft Computing2
x1 (n) z1 (n)
y1 (n)
Problem Solution
Hybrid Soft Computing Architecture - 7
Soft Computing1
Soft Computing2
x1 (n) z1 (n)
y1 (n)
Solution
Hybrid Soft Computing Architecture - 8
Soft Computing1
Soft Computing2
x1 (n) z1 (n)
y1 (n)
Problem Solution
Δ
Hybrid Soft Computing Architecture - 9
Application examples
1. Business Intelligence2. Data Mining
Business Intelligence
“The key in business is to know something that nobody else knows.”
— Aristotle Onassis
“To understand is to perceive patterns.”— Sir Isaiah Berlin
Coping with Information• Computerization of daily life produces data
• Point-of-sale, Internet shopping (& browsing), credit cards, banks . . .
• Information on credit cards, purchase patterns, product preferences, payment history, sites visited . . .
• Travel: One trip by one person generates info on destination, airline preferences, seat selection, hotel, rental car, name, address, restaurant choices . . .
• Data cannot be processed or even inspected manually
Data Overload• Only a small portion of data collected is analyzed
(estimate: 5%)• Vast quantities of data are collected and stored out
of fear that important info will be missed• Data volume grows so fast that old data is never
analyzed• Database systems do not support queries like
• Who is likely to buy product X• List all reports of problems similar to this one• Flag all fraudulent transactions
• But these may be the most important questions!
• Business intelligence is a smaller component of business process management. Business intelligence is knowing exactly what is happening in an organization. It's taking the pulse.
• It assists businesses in making better business decisions.
• Strong piece to measure the company's performance
• Monitors the financial and operational health of the organization.
• Provides two- way integration with operational systems and information feedback analysis.
What is Business Intelligence?
E-Commerce
BUYER
FINDS
SELLER
NEGOTIATION
PAYMENT
SALE
DELIVERY
POST-SALE
ACTIVITY
SELECTION
OF GOODS
SEARCH ENGINE
SHOPPING BOT
AGGREGATOR
ON-LINE CATALOG
AUTOMATED AGENTS
TRACKING AGENT
ON-LINE HELP
INTERNET TELEPHONY
CUSTOMER PREFERENCES
BARGAINING STRATEGIES
PRICE SENSITIVITIES
CREDIT/PAYMENT INFORMATION
ON-LINE PROBLEM REPORTS
•FOLLOW-ON SALES OPPORTUNITIES
Technologies Used Information gathered
BROWSING BEHAVIOR
DELIVERY REQUIREMENTSE-PAYMENT SYSTEMS
CONFIGURATOR
RECOMMENDER AGENT
TRANSACTION PROCESSOR
DATA INTERCHANGE
CRYPTOGRAPHY
BROWSER SHARING
MARKET BASKET
PERSONAL DATA
CUSTOMER SATISFACTION
SEARCH BEHAVIOR
EFFECTIVENESS OF PROMOTIONS
What is Web Mining?
Web mining is the application of data mining or other information process techniques to WWW, to find useful patterns.
•Due to intense competition on one hand and the customer’s option to choose from several alternatives business community has realized the necessity of intelligent marketing strategies and relationship management.
•Web usage mining attempts to discover useful knowledge from the secondary data obtained from the interactions of the users with the Web.
•Web usage mining has become very critical for effective Web site management, creating adaptive Web sites, business and support services, personalization, network traffic flow analysis and so on.
•Analyzing the Web access logs can help understand the user behaviour and the web structure.
•From the business and applications point of view, knowledge obtained from the Web usage patterns could be directly applied to efficiently manage activities related to e-business, e-services etc.
•Accurate Web usage information could help to attract new customers, retain current customers, improve cross marketing/sales, effectiveness of promotional campaigns, tracking leaving customers and find the most effective logical structure for their Web space.
•Preprocessing •Pattern Analysis•Pattern Discovery
•Content and•Structure Data
•"Interesting"•Rules, Patterns,•and Statistics
•Rules, Patterns,
•and Statistics
•Preprocessed•Clickstream
•Data
•Raw Usage•Data
Web Usage MiningContrary to popular belief, everything necessary for data mining Web traffic is NOT always automatically collected.
Web Usage Data Sources
Phone Line "Internet"
Client Computer
Modem
ISP Server Web ServerContent Server
Server Logs
User Behaviors
Site Content
• Sources - Client level, Server level.• Abstractions - User, Page File, Page View, Server
Session.
Taxonomy of Web Mining Methods
Web Mining Methods
Data Clustering
Predictive Modeling
• Decision Trees
• Neural Networks
• Machine learning
Deviation
Detection
• Clustering
• K-Means
• Fuzzy
• ACC
Link
Analysis
Rule Association Visualization
Text
Mining
Semantic Maps
Predictive Modeling• Objective: use data about the past to predict
future behavior• Sample problems:
• Will this (new) customer pay his bill on time? (classification)
• What will the Dow-Jones Industrial Average be on October 15? (prediction)
Predictive Modeling
Horia Maria Daniel
Honest
JohnJeffJames
Crooked
Which characteristics distinguish the two groups?
Web Log File Sample <cs.okstate.edu>
Web Usage Mining Framework
Web Usage Mining Framework
Usually a number of cluster centers are randomly initialized and the FCM algorithm provides an iterative approach to approximate the minimum of the objective function starting from a given position and leads to any of its local minima.
No guarantee ensures that FCM converges to an optimum solution (can be trapped by local extrema in the process of optimizing the clustering criterion).
The performance is very sensitive to initialization of the cluster centers.
An evolutionary algorithm is used to decide the optimal number of clusters and their cluster centers.
Evolutionary Fuzzy Clustering
Learning Rules with Evolutionary Algorithms
The chromosome encodes individual rules. Only the best individual is considered to form part of the solution.Initial rules were generated using grid partitioning system.
EA’s were then used to evaluate this rules to incorporate the rule into the final set of rules using a iterative learning
approach by penalizing less contributing rules.
•A1
•B1
•A2
•B2•x
•y •x
•y
•A1 •A2
•B1
•B2
Genetic Representation of Fuzzy Rules
• Chromosome representing “m” fuzzy rules• 1 stands for a selected and 0 for a non-selected rule
• Length of the string depending on the number of input • and output variables.
• 3 input variables composed of 3,2,2 fuzzy sets• 1 output variable composed of 3 fuzzy sets
High level representation – reduces computational complexity
Chromosome structure of the i-Miner
•over 7 million hits in a week !!!!
Monash University’s Central Web site @ Melbourne, Australia
Hourly Web Traffic
Daily Web Traffic
• Over 7 million hits in a week !!
• Due to the enormous traffic volume and chaotic access behaviour, the prediction for Web user access patterns becomes more difficult and complex
Pattern Discovery and Trend Analysis• Formulation of Clusters • Discovering Hidden Information• Daily and Hourly Trends (Volume of Hits)
Data Complexity
Ant Colony Clustering•Workers have been reported to sort their larvae or form piles of corpses –literally cemeteries – to clean up their nests.
•The basic mechanism underlying this type of aggregation phenomenon is an attraction between dead items mediated by the ant workers: small clusters of items grow by attracting workers to deposit more items.
•The general idea is that isolated items should be picked up and dropped at some other location where more items of that type are present.
Eric Bonabeau, Marco Dorigo and Guy Théraulaz, 1999. Swarm Intelligence: From Natural to Artificial Systems, Santa Fe Institute in the Sciences of the Complexity, Oxford Univ. Press, New York.
Parameters SettingsThe statistical / text data from 01 January 2002 to 07 July
were used.
• Takagi Sugeno Fuzzy Inference System (TSFIS)• 81 Fuzzy if-then Rules• 50 Epochs
• Backpropagation Neural Networks (BPNNs)• Neurons: 14 / 17• Momentum: 0.05 / 0.2• 3000 Epochs
• Linear Genetic Programming (LGP)• 500 Population, 200,000 tournaments• 0.9 Crossover / Mutation rate • 256 Maximum Program Size
Parameters Settings (i – Miner)•Population size •30
•Maximum no of generations •35
•Fuzzy inference system •Takagi Sugeno
•Rule antecedent membership functions
• Rule consequent parameters
•3 membership functions per input variable parameterized Gaussian
•linear parameters•Gradient descent learning •10 epochs
•Ranked based selection •0.50
•Elitism •5 %
•Starting mutation rate •0.50
Hidden Knowledge From SOM Clusters
daily
traffic
hourly
traffic
Hidden Knowledge From Clusters
Hidden Knowledge From Clusters
Hidden Knowledge From Clusters
E-FCM Clusters
E-FCM Clusters
E-FCM Clusters
Fuzzy clustering of visitors based on the day of access
ACO Clustering
•t = 1 •t = 100 •t = 500
•t = 900 •t = 10,000,000Daily Web traffic data on a 25 x 25 non-parametric toroidal grid, 14
ants
ACO Clustering
•Hourly web traffic - hourly Web traffic data on a 45 x 45 non-parametric toroidal grid, 48 ants
•t = 1 •t = 100 •t = 500
•t = 900•t = 10,000,000
Performance of i-Miner (Training)
Performance of i-Miner (Test)
Performance of the different paradigms
Hybrid methodDaily (1 day ahead)RMSE CC
Train Test
ANT-LGP 0.0191 0.0291 0.9963
i-Miner (FCM-FIS) 0.0044 0.0053 0.9967
SOM-ANN 0.0345 0.0481 0.9292
SOM-LGP 0.0543 0.0749 0.9315
Performance of the different paradigms
Hybrid method Hourly (1 hour ahead)RMSE CC
Train Test
ANT-LGP 0.2561 0.035 0.9921
i-Miner (FCM-FIS) 0.0012 0.0041 0.9981
SOM-ANN 0.0546 0.0639 0.9493
SOM-LGP 0.0654 0.0516 0.9446
Test results of the daily trends for 6 days
Test results of the average hourly trends
Data Mining
96
Automatic Design of Hierarchical Takagi-Sugeno Type Fuzzy Systems
As a way to overcome the curse-of-dimensionality, it was suggested to arrange several low-dimensional rule base in a hierarchical structure, i.e., a tree, causing the number of possible rules to grow in a linear way according to the number of inputs.
Building a hierarchical fuzzy system is a difficult task. This is because we need to define the architecture of the system (the modules, the input variables of each module, and the interactions between modules), as well as the rules of each modules.
97
Automatic Design of Hierarchical Takagi-Sugeno Type Fuzzy Systems
Two approaches could be used to tackle this problem.
- Expert supplies all the required knowledge for building the system.
- The other one is to use machine and/or optimization techniques to construct/adapt the system.
Several machine learning and optimization techniques have been applied to aid the process of building hierarchical fuzzy systems.
98
Automatic Design of Hierarchical Takagi-Sugeno Type Fuzzy Systems
The problems in designing a hierarchical fuzzy logic system includes the following:
• Selecting an appropriate hierarchical structure;
• Selecting the inputs for each fuzzy TS sub-model
• Determining the rule base for each fuzzy TS sub-model
• Optimizing the parameters in the antecedent parts and the linear weights in the consequent parts.
99
Automatic Design of Hierarchical Takagi-Sugeno Type Fuzzy Systems
100
Proposed ApproachThe hierarchical structure is evolved using a Probabilistic Incremental Program Evolution (PIPE). The fine tuning of the rule's parameters encoded in the structure is accomplished using Evolutionary Programming (EP).
The proposed method interleaves both PIPE and EP optimizations. Starting with random structures and rules' parameters, it first tries to improve the hierarchical structure and then as soon as an improved structure is found, it fine tunes its rules' parameters. It then goes back to improve the structure again and, provided it finds a better structure, it again fine tunes the rules' parameters.
This loop continues until a satisfactory solution (hierarchical TS-FS model) is found or a time limit is reached.
101
A tree-structural based encoding method.
The reasons for choosing this representation:
(1) the tree has a natural and typical hierarchical layer;
(2) with pre-defined instruction sets, the tree can be created and evolved using the existing tree-structure-based approaches, i.e., Genetic Programming (GP) and PIPE algorithms.
Encoding
102
Encoding
Assume that the used instruction set is I={+2, +3, x1, x2, x3, x4, where +2 and +3 denote non-leaf nodes' instructions taking 2 and 3 arguments, respectively. x1, x2, x3, x4 are leaf nodes' instructions taking zero arguments each.
103
PIPEPIPE combines probability vector coding of program instructions, population based incremental learning and tree-coded programs.
PIPE iteratively generates successive populations of functional programs according to an adaptive probability distribution, represented as a Probabilistic Prototype Tree (PPT), over all possible programs.
Each iteration uses the best program to refine the distribution.
Thus, the structures of promising individuals are learned and encoded in PPT.
104
Program Development
Example of node’s N1,0’s instruction probability vector P1,0 (left). Probabilistic proto type treePPT(middle). Possible extracted program (right).
Comparison of the incremental type multilevel FRS (IFRS), aggregated type mutilevel FRS (AFRS), and the hierarchical TS-FS for Mackey-Glass time-series prediction Model layer No. of rules No. of para. RMSE(train) RMSE(Test)IFRS 4 25 58 0.0240 0.0253 AFRS 5 36 78 0.0267 0.0256 HTS-FS 3 24 33 0.0179 0.0167
Duan, J.-C. and Chung, F.-L. : Multilevel fuzzy relational systems: structure and identification. Soft Computing, Vol. 6, (2002) 71-86
The structure of the evolved hierarchical TS-FS model for predicting of Mackey-Glass time-series
The importance degree of each input variables for Mackey-Glass time-series xi x0 x1 x2 x3 x4 x5 Impo(xi) 0.247 0.332 0.072 0.113 0.056 0.180
The developed optimal H-TS-FS architectures (Irisdata)
The developed optimal H-TS-FS architectures (Wine data)
Hybrid Soft Computing: Some Challenges
• Lots of success stories!• We need programs that could deal with common senseinformatic situation !!
• Most of the existing frameworks rely on user specifiedparameters. The intelligent system should be able to learnfrom data in a continuous, incremental way, able to grow asthey operate, update their knowledge and refine the modelthrough interaction with the environment.
• Adaptation process could learn from success andmistakes and apply that knowledge to new problems.
• Managing computational complexity.
What color is this rectangle?
Is this called “yellow”?
•114
People define the limits of a color, such as yellow
• Different idea of what is “yellow”• Knowledge is acquired by learning• Personal situation, drugs, job etc. all can affect!
Limitations of the Human Mind
• Naming of colors. Based on learning, not on absolute standards.
• Face recognition. Cannot be passed on to another person by explanation.
• Object recognition. People cannot properly explain how they recognize objects.
Moore’s Law
&
Thank You