11
Ferda Ferda Visual Environment Visual Environment for Data Mining for Data Mining Marti Marti n Ralbovský n Ralbovský

Ferda Visual Environment for Data Mining Martin Ralbovský

Embed Size (px)

Citation preview

Page 1: Ferda Visual Environment for Data Mining Martin Ralbovský

FerdaFerdaVisual Environment for Visual Environment for

Data MiningData Mining

MartiMartin Ralbovskýn Ralbovský

Page 2: Ferda Visual Environment for Data Mining Martin Ralbovský

Ferda Ferda History 1History 1

LISp-Miner LISp-Miner System – Implementation System – Implementation of several GUHA procedures + moreof several GUHA procedures + more

2003: Idea of creating a new 2003: Idea of creating a new Clementine-like visual interface for Clementine-like visual interface for LISp-MinerLISp-Miner

2003: Ferda project started based on 2003: Ferda project started based on this ideathis ideasubject subject Softwarový projekt Softwarový projekt at MFF UKat MFF UK

Page 3: Ferda Visual Environment for Data Mining Martin Ralbovský

Ferda History 2Ferda History 2

2004 – 2006: Development of Ferda 2004 – 2006: Development of Ferda projectproject

February 2006: Ferda presented at February 2006: Ferda presented at Znalosti 2006 conferenceZnalosti 2006 conference

April 2006: Ferda became a approved April 2006: Ferda became a approved software project at MFF UKsoftware project at MFF UK

Now: Further development of Ferda Now: Further development of Ferda system, master theses of Ferda system, master theses of Ferda creatorscreators

Page 4: Ferda Visual Environment for Data Mining Martin Ralbovský

Ferda AdvantagesFerda Advantages

Modular and extensible architecture, Modular and extensible architecture, usage of middleware, support for usage of middleware, support for distributed computingdistributed computing

Ferda’s box model: ability implement Ferda’s box model: ability implement and include new boxes, possible and include new boxes, possible engine for EverMinerengine for EverMiner

Comprehensive user interface Comprehensive user interface including new features such as box including new features such as box archivearchive

Page 5: Ferda Visual Environment for Data Mining Martin Ralbovský

Ferda DisadvantagesFerda Disadvantages

Not so well tested (haven’t been Not so well tested (haven’t been used for education)used for education)

Dependent on LISp-Miner modules Dependent on LISp-Miner modules and metabaseand metabase

Slower then LISp-MinerSlower then LISp-Miner

Page 6: Ferda Visual Environment for Data Mining Martin Ralbovský

Future goals for FerdaFuture goals for Ferda

““Spreading Ferda”Spreading Ferda” Getting more people to work for Getting more people to work for

Ferda – creation of new boxes, Ferda – creation of new boxes, modulesmodules

Cooperation with other systemsCooperation with other systems Road to EverMinerRoad to EverMiner

Page 7: Ferda Visual Environment for Data Mining Martin Ralbovský

Master theses improvements Master theses improvements for Ferdafor Ferda

Reimplementing LISp-Miner Reimplementing LISp-Miner proceduresprocedures

Relational versions of some Relational versions of some procedures (SD4FT)procedures (SD4FT)

Domain knowledge supportDomain knowledge support

Page 8: Ferda Visual Environment for Data Mining Martin Ralbovský

Reimplementing LISp-Miner Reimplementing LISp-Miner procedures 1procedures 1

Not working with the metabase Not working with the metabase anymore– faster implementationanymore– faster implementation

Modular implementation of data Modular implementation of data mining task - enables the full mining task - enables the full potential of the Ferda’s box modulepotential of the Ferda’s box module

Open implementation of Open implementation of 4ft, SD4ft, 4ft, SD4ft, KL, SDKL, CFKL, SDKL, CF and and SDCF SDCF procedures procedures

Page 9: Ferda Visual Environment for Data Mining Martin Ralbovský

Reimplementing LISp-Miner Reimplementing LISp-Miner procedures 2 – further plansprocedures 2 – further plans

Enabling fuzzy computingEnabling fuzzy computing Data stream support – connecting Data stream support – connecting

Ferda to Sumatra TTFerda to Sumatra TT Distributed computingDistributed computing KL Collaps, 4ftUV Filter KL Collaps, 4ftUV Filter

implementationimplementation ““little” improvements to task setup little” improvements to task setup

(literal, cedent…)(literal, cedent…)

Page 10: Ferda Visual Environment for Data Mining Martin Ralbovský

Ontologies in FerdaOntologies in Ferda

Ontologies aid user in various phases Ontologies aid user in various phases of CRISP-DM cycle, planning to of CRISP-DM cycle, planning to develop (semi)automated tools to develop (semi)automated tools to help with:help with:

Identification of redundant attributesIdentification of redundant attributes Creation of attributesCreation of attributes Creation of partial cedentsCreation of partial cedents ……

Page 11: Ferda Visual Environment for Data Mining Martin Ralbovský

Field knowledge in FerdaField knowledge in Ferda

Field knowledge – vague term, rules Field knowledge – vague term, rules that are common knowledge, widely that are common knowledge, widely accepted in a domainaccepted in a domain

Formalization of field knowledge using Formalization of field knowledge using abstract attributes and quantifiersabstract attributes and quantifiers

Creation of boxes in Ferda that enable Creation of boxes in Ferda that enable user to express field knowledge, user to express field knowledge, veryfiing field knowledge against veryfiing field knowledge against procedures’ outputprocedures’ output