View
235
Download
0
Tags:
Embed Size (px)
Citation preview
GUI implementation for Supervised and Unsupervised SUBDUE System
Introduction
Data Mining is a very vast field, which is rapidly developing today. It is the use of artificial intelligence to find common, interesting, or previously unknown patterns in large databases.
One method for discovering knowledge in structural data is the identification of common substructures (concepts represented as graphs) within the data.
2
Introduction (Contd..)
The SUBDUE system, developed by Cook and Holder [Cook and Holder, 1999] performs data mining on databases represented as graphs, i.e. it discovers interesting substructures in structural data.
This project deals with the conversion of a textual representation of the data into a graphical visualization of the data The input data given to the program is the outputs provided by the existing SUBDUE.
3
The Subdue System
The SUBDUE system is a data-mining tool that discovers interesting substructures in structural data
By compressing previously discovered substructures in the data multiple passes of SUBDUE produce a hierarchical description of the structural regularities in the data.
4
Unsupervised Subdue
5
Example of the input graph to the unsupervised version of the subdueand the discovered substructure.
Unsupervised (Contd..)
The substructure discovery algorithm used by SUBDUE is beam search. A substructure consists of a definition and a set of instances. The substructure definition is a connected set of vertices and edges that define a SUBGRAPH within G. An instance of a substructure is a SUBGRAPH of G that matches, graph theoretically, to the definition. The algorithm for the discovery is given below.
6
Unsupervised Contd.)
SUBDUE (Graph, BeamWidth, MaxBest, MaxSubSize, Limit)
ParentList={}
ChildList={}
BestList={}
ProcessedSubs=0
Create a substructure from each unique vertex label and its single-vertex instances; insert the resulting substructures in ParentList
While ProcessedSubs < = Limit and ParentList is not empty do
While ParentList is not empty do
Parent = RemoveHead (ParentList)
Extend each instance of Parent in all possible ways
Group the extended instances into Child substructures
For each Child do
If SizeOf (Child) <= MaxSubSize then
Evaluate the Child
7
Unsupervised Contd.)
8
Insert Child in ChildList in order by value
If Length (ChildList)>BeamWidth then
Destroy the substructure at the end of the ChildList
ProcessedSubs = ProcessedSubs + 1
Insert Parent in BestList in order by value
If Length (BestList) > MaxBest then
Destroy the substructure at the end of the BestList
Switch ParentList and ChildList
return BestList
Supervised
SUBDUE performs supervised graph based relational concept learning. The SUBDUE concept learner accepts both a positive graph and a negative graph , and evaluates substructures based on their compression of the positive graph and lack of compression on the negative graph.
9
Supervised Contd.
10
Supervised (Contd.)
In the figure above each object has a shape and is related to other objects using the binary relation on and shape. The discovered substructure is as shown in the figure. As seen above the SUBDUE discovered substructure is from the positive example and not from the negative example. For this example the best substructure, which gives the maximum compression, is triangle on a square.
11
Subdue - Chemical Compounds
A DNA sequence can be represented as a very simple linear graph, and higher-level relationships between different parts of a sequence can be mapped to additional edges in graph.
SUBDUE system discovers patterns in the input graph in polynomial time.
SUBDUE system is capable of discovering known patterns in the DNA sequence of yeast,as well as patterns in yeast DNA that are known to be important in other organism, but which have not yet been shown to play a role in yeast.
12
Subdue - Chemical Compounds
A DNA sequence can be represented as a very simple linear graph, and higher-level relationships between different parts of a sequence can be mapped to additional edges in graph.
SUBDUE system discovers patterns in the input graph in polynomial time.
SUBDUE system is capable of discovering known patterns in the DNA sequence of yeast,as well as patterns in yeast DNA that are known to be important in other organism, but which have not yet been shown to play a role in yeast.
13
Subdue - Chemical Compounds
The figure shown above is the backbone representation which gives more meaningful graphs then the linear representation.
This representation separated the base names (A, C, T, G) from the vertices representing themselves.
The backbone representation mimics the actual chemical structure of the DNA molecule, in which the DNA bases are connected by deoxyribose sugars to a linear phosphate backbone.
14
GUI Design
Requirements
The requirements of the GUI are as follows.
File Dialog Boxes should be added for better user access to the input files.
The entire visual representation of the graphs needs to be shown on the screen. Sometimes these representations exceed the length of the screen. To accommodate these large graphs, scrollbars need to be incorporated into the design.
15
GUI Design
User interfaces must be provided so that the user can interact with the GUI for displaying the results of each and every iteration. So a button called “ Next Iteration”, which activates the display of substructures on the screen, needs to be incorporated.
A Button called the “Compress” button should be provided to the GUI. This button enables the user to see the compressed graph.
16
GUI Design
For the supervised version of SUBDUE, both negative and positive graphs need to be displayed.
The vertices of the graph should display their labels inside the vertices.
Since directed edges are used, the arrows with appropriate directions should be displayed.
The language to be used for implementation should be portable and be able to run from a browser. It should also have good GUI components. So JDK 1.2 was used to implement the program.
17
Implemenation
Unsupervised SUBDUE Unsupervised SUBDUE GUI requires two input files. A
position file which determines the graph position and the output file from the SUBDUE which is parsed as per the requirements of the GUI by the conversion program.
The driver class of the applet is the unsupervised class. This class initializes the applet. When user clicks the next iteration button canvas2 class is invoked and best substructures found in that iteration is displayed. These substructures are arranged by their MDL value.
18
Implemenation
Unsupervised SUBDUE When user clicks the compress button then
canvas1 class is invoked. This class compresses the input graph by replacing the instances of the best substructure of the iteration by single vertices. The compressed graph will be further compressed when the using the results of that iteration.when-compressed button is clicked. The flow diagrams are as shown below.
19
Implemenation
Supervised GUI Implemenation Supervised SUBDUE GUI requires three input files.Two
position files which determines the positive graph position and the negative graph position and the output file from the SUBDUE which is parsed as per the requirements of the GUI by the conversion program.
The driver class of the applet is the supervised class. This class initializes the applet. When user clicks the next iteration button canvas2 class is invoked and best substructures found in that iteration is displayed. These substructures are arranged by their MDL value.
20
Implemenation
Supervised GUI Implemenation When user clicks the compress button then
canvas1 class is invoked. This class compresses the input graph by replacing the instances of the best substructure of the iteration by single vertices. The compressed graph will be further compressed when the using the results of that iteration.when-compressed button is clicked. The flow diagrams are as shown below.
21
Implemenation
Parser for the GUI input. The Conversion program parses the output from
the subdue to make it compatible to the GUI program. This can parse both supervised and unsupervised GUI.The conversion program takes in longer phrases and replaces it by shorter ones It eliminates blank lines and arranges the output in the format required for the GUI input.
Each line starts with a small phrase for E.g. It(iteration number ), C(for compression), v(vertices),e(edges),Val(Value), It(instances).
22
Implementation
Parser for the GUI input.
The phrase “Best substructures” indicates the starting of the iteration and the phrase “Graph is compressed using best substructure.” Indicates a compressed graph information
23
Demo Of the Project
24