Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
How important are they?
PERSONALIZATION, CONTEXT AND SEMANTICSIN INFORMATION SYSTEMS
2
About me
Ernesto William De Luca
University of Applied Sciences Potsdam(http://iw.fh-potsdam.de/)
Substitute Professor forInformation Science andInformation Retrieval
E-mail: [email protected]
3
Outline
About me Introduction Where are we? Problems Future of Search
Topics of the course
4
Introduction – Where are we?Evolution of Search
5
Introduction – Where are we? Intelligent query extension
Source: www.semager.com
6
Introduction – Where are we?Extracting and providing of knowledge
Source: www.google.com
7
Introduction – Where are we? Search term suggestion
Source: www.google.com
Das Bild kann zurzeit nicht angezeigt werden.
8
Introduction – Where are we? Context dependent result presentation
Source: www.google.com
9
Introduction – Where are we? Clustering of results
Source: de.vivisimo.com
10
Introduction – Where are we? Recognizing relations
Source: www.wolframalpha.com
11
Introduction – Where are we?Generating knowledge
Source: www.wolframalpha.com
12
Introduction – Where are we? Detecting homonyms and acronyms
13
Introduction – Where are we?Understanding natural language queries
Source: www.powerset.com
14
Introduction – Where are we?Summary
Extracting and processing structured information unstructured information
Providing knowledge improving services
Inferring knowledge Searching for/in knowledge Presenting knowledge
15
Outline
About me Introduction Where are we? Problems Future of Search
Topics of the course
16
Introduction – Problems Entity recognition can always be improved
Source: www.semager.com
17
Introduction – Problems Clustering by senses is necessary
Source: de.vivisimo.com
18
Introduction – Problems Collaboratively gained search term suggestions may be out of scope
Source: www.google.com
19
Introduction – Problems Facts are not known for each search
Source: www.google.com
20
Introduction – Problems Knowledge not linked correctly
Source: www.wolframalpha.com
21
Introduction – Problems Not personalized recommendation
Source: www.wolframalpha.com
22
Introduction – Problems Semantic suggestions were language indepenent
23
Introduction – Problems Understanding Query in Natural Language
Source: www.powerset.com
24
Outline
About me Introduction Introduction Problems Future of Search
Topics of the course
25
Introduction – Future of Search
Knowledge helps understanding the intention of queries Knowledge is retrieved from interactions
You are living in Berlin.Last week you have
searched for spicy recipes.
26
Introduction – Future of Search
Knowledge will always be structured
<div xmlns:contact=http://www.w3.org/2001/vcard-rdf/3.0#class="contactinfo" about="http://example.org/staff/robertc"><span property="contact:fn">Rob Crowther</span>.<span property="contact:title">Web hacker</span>at<a rel="contact:org" href="http://example.org">
Example.org</a>. You can contact me <a rel="contact:email" href="mailto:[email protected]">
via e-mail</a>or on my <span property="contact:tel">
<span property="contact:type">work</span>phone at <span property="contact:value">0123 456789</span>
</span>.
</div>
<div class="contactinfo">Rob Crowther. Web hackerat<a href="http://example.org">
Example.org</a>.You can contact me<a href="mailto:[email protected]">
via e-mail</a>or on my work phone at 0123 456789.
</div>
instead semantic annotation
27
Introduction – Future of Search
Knowledge is retrieved everywhere
Britney Spears currenthair color is blond.
06.09.2009 Source: www.viply.de
You are watching soccer: Hamburger SV vs. FC Bayern. Intermediate
result in the AOL Arena is 0:0
Source: www.sky.de
28
Introduction – Future of Search
Speech can be in- and output
How long do I have to wait forthe bus to themain station?
10 minutes
image source: www.portel.de(Huawei Android-Smartphone © Huawei)
29
Introduction – Future of Search
Linked knowledge for answering complex questions.
Known facts: James T. Kirk is the captain of the Enterprise. Enterprise is a TV series. James T. Kirk was killed by Soran. Soran was played by Malcolm McDowell.
„Who played the killer of James T. Kirk?“
„Malcolm McDowell played thekiller of James T. Kirk“
30
Introduction – Future of Search
Entity recognition at query time
31
Introduction – Future of Search
Automatic text summarizations with minimal lossof semantics.
This webseite is about the movie: Star Trek: Generations. In this movie both captains of theenterprise (James T. Kirk and Jean Luc-Picard) work together to stop Soran.
INTRODUCTION TOINFORMATION RETRIEVAL
Evolution of the Search
33
chair
Interaction and Retrieval (Humans)34
Content type: Analog Digitalized Digital Hybrid
Libraries
Interaction and Retrieval (Data)
Archives
Internet Social Networks
35
Interaction and Retrieval (Media Type)
Text
Picture Video
Audio
36
State of the Art
Current Problems Archives / Libraries / Digital Libraries (DL)
Mostly English supported search Mostly keyword-based search Librarians
Search Experts But no domain-specific knowledge
Search Engines Huge amount of Web documents Mostly keyword-based (monolingual) search Manually and automatically derived categories
Based on statistical methods only Lack of semantics (given a query)
General Goal: Find relevant information related to user query Structure and classification of information
37
State of the Art
IR Problems Dagobert Soergel:
Important problems in information retrieval Problem 1. Assisting the user in clarifying and analyzing the
problem and determining information needs. Problem 2. Knowing how people use and process information. Problem 3. Knowledge representation. Problem 4. Procedures for processing knowledge/information. Problem 5. The human-computer interface. Problem 6. Designing integrated workbench systems. Problem 7. Designing user-enhanced information systems. Problem 8. System evaluation.
38
State of the ArtCurrent Search Engines - GoogleVivísimoTeoma
39
Rough timeline of the generations of information retrieval in digital libraries
Current ResearchEvolution of Information Retrieval
Bruce R. Schatz, „Information Retrieval in Digital Libraries: Bringing the Search to the Net." Science, Vol. 275. 1997
40
Multilingual Social Semantic Digital LibraryInvolves the world community into sharing multilingual knowledge
Current ResearchEvolution of Digital Content
Digital Enterprise Research Institute (www.deri.org)
Sebastian Kruk; „Digital Libraries of the Future. Use of Semantic Web and SocialBookmarking to support E-Learning in Digital Libraries“, Digital Enterprise Research Institute (DERI) National University of Ireland, Galway. 2006.
41
i2010 (“information space, innovation and investment and inclusion”)to establish a single European information space
Current ResearchEvolution of the Legal and Technical Landscape
(Source: DLA Piper, 2007)
42
source: http://web2.wsj2.com/
Current ResearchStructured Interaction and Retrieval
43
Summary 43
Current Research What is Web 2.0? „Definition“ (O‘Reilly).
Web 1.0 Web 2.0 NewDoubleClick Google AdSense personalisedOfoto Flickr tagging, communityAkamai BitTorrent P2Pmp3.com Napster P2PBritannica Online Wikipedia community, free contentpersonal websites blogging dialogEvite upcoming.org and EVDBdomain name speculation search engine optimizationpage views cost per click pay for participationscreen scraping web services interoperabilitypublishing participationCMS wikis flexibility, freedomdirectories (taxonomy) tagging ("folksonomy") community, freedomstickiness syndication open content
44
Current Research?
Semantic Wikis
Nova Spivack: Metaweb
45
Current Research?Is this the digital future?
INTRODUCTION TOINFORMATION RETRIEVAL
Topics
47
Introduction – Topics of the course
Information Retrieval
48
Introduction – Where are we?Evolution of Search
Multilingual Search
49
Introduction – Where are we?Evolution of Search
Named-EntityDisambiguation
50
Introduction – What can we do?
Information Retrieval
Multilingual Search
Named-EntityDisambiguation
An
Intro
duct
ion
INFO
RM
ATI
ON
RET
RIE
VAL
2
Introduction to Information Retrieval � What is IR?
� The IR task � IR process � Components of an IR system � Web search systems
� Related research fields
3
What Do We Discuss?
� How do search engines work? � How do they collect information? � What “tricks” do they apply? � How can search methods be used “outside the web”?
� How can we improve search approaches? � Do they support natural language? � How can user interaction be improved?
� How can we speed up computation? � Data structures � Caching � Compression, …
4
What Do We Discuss?
� How can we decide, whether a search approach really works? � In general for all queries or for specific queries � For specific document collections or the whole web � What kind of measures can we use?
� What else can we do? � Other types of media? � Other tasks?
5
What is Information Retrieval (IR)?
� Salton 1968: Information retrieval is a field concerned with the structure, analysis, organization, storage, searching, and retrieval of information.
� Wikipedia (German version): IR is a research field that deals with computer supported, content based, and vague search in unstructured data collections.
� Information in the form of: � In general: unstructured data � Most often: text documents � But also: images, videos, music, …
6
From Data to Knowledge
� Data � Tokens/characters that can be processed
by a machine � Data has no further meaning besides
its simple presence � Collection of facts, figures,
statistics, …
Data
Information
Knowledge
Wisdom
7
From Data to Knowledge
� Information � Interpreted data that gets a meaning � Abstract content of the data � Useful data in the process of asking
interrogative questions e.g. � Who? � What? � Where ? � How many ? � When ?
� Information = Data + Meaning/Purpose
Data
Information
Knowledge
Wisdom
8
From Data to Knowledge
� Knowledge � Meaningful combination of information � Information has been processed, organized or
structured, or otherwise being applied or put into action
� Knowledge = Information + Processing
Data
Information
Knowledge
Wisdom
9
From Data to Knowledge
� Wisdom � Recently introduced � Requires understanding
� An appreciation of why
� Evaluated understanding � Further added value through unique
and personal judgment (ethical aspects)
Data
Information
Knowledge
Wisdom
10
From Data to Knowledge
11
What is Information Retrieval?
� Indexing and retrieval (finding and re-finding) of text documents
� Search for web pages in the World Wide Web is the current “killer application”
� It is mainly search for relevant documents given a certain question (query)
� It is also efficient search of documents in very large document collections
� It is not data retrieval as in databases
12
Databases
� Storage and retrieval of data � Data is stored in a clearly defined structure
� e.g., in tables with different columns � Structure and meaning of the data is precisely
determined in a scheme � Query language
� Artificial, with restricted syntax and vocabulary � Exact and complete specification of what is requested
All exactly matching items shall be retrieved All items are equally relevant
13
Information Retrieval Systems
� Unstructured data � E.g., natural language texts
� Query language � Mainly based on natural language
Impossible to exactly specify the results of interest � Interest in partial matches � Relevance as central issue � Goal: fast access to relevant documents
� E.g., through a sorted list
� IR-System must be able to interpret the content of documents in the context of the user query
14
Databases vs. IR Systems
van Rijsbergen, 1979
Data Retrieval Information Retrieval Matching exact partial, best Inference deduction induction Model deterministic probabilistic Classification monothetic polythetic Query language artificial natural Query specification complete incomplete Items wanted matching relevant Error response sensitive insensitive
15
Task of an IR System
� In general � Answer a question or � Find a specific piece of information
� Typical simplification � Given to the system:
� A pre-existing set of “canned” natural language documents � A query in form of a text string
� Seek: � An ordered set of documents relevant to the query � The most relevant out of the repository and display them to
the user
16
Task of an IR System
� Build a system that retrieves documents that are most likely relevant to the user
Information need
query
docu- ments
retrieval system
documentcollection
17
Other Tasks of an IR System
documents Retrieval System
categories
k n-1
k 1
k 2
k n
…
Classification
18
Other Tasks of an IR System
documents Retrieval System
categories
Filtering
pos
neg
19
Other Tasks of an IR System
documents Retrieval System
users
Routing
u n-1
u 1
u 2
u n …
20
Other Tasks of an IR System
documents Retrieval System
cluster
Clustering
c n-1
c 1
c 2
c n
…
21
What is Information Retrieval?
� Starting point � User with information need, i.e., a lack of
information � Three phases
� Asking a question (Information Need) � Constructing an answer (Response) � Assessment of the answer (Evaluation)
� Iterative process � Several questions might be necessary to
satisfy the information need
22
Information Need
� Perceived gap in the user’s knowledge � Concrete Information Need
� A specific piece of information is required � E.g.:
What is the capital of Germany? When does the train to Munich leave?
� Problem-oriented Information Need � Research about a specific topic � A collection of several documents is required � E.g.:
What is the current state-of-the-art in web search engine technology?
23
Asking a Question
� Person asking user � Is in a certain cognitive state (context, frame of mind) � Is aware of a gap of knowledge but might not be able
to described it � However: Is required to specify his information need
� Paradox of Finding Out About � “The need to describe that which you do not know in
order to find it” (Roland Hjerppe) � You can only ask the right question, if you know what
the result is � Query
� Expression of this ill-defined state
24
Answering the Question
� Say the question answerer is human � Does the answerer know the answer himself? � Can he translate the user’s ill-defined
question into a better one? � Is he able to verbalize his answer? � Will the user understand this verbalization? � Can he provide the needed background?
� Say the question answerer is a computer system � …
25
Assessing the Answer – Relevance
� How well does it answers the question? or How relevant is the answer to the user?
� Relevance is a subjective assessment and can include: � Right topic � From the right time frame (up to date) � From a trusted source � Answer considers goals and intended usage of the
user (information need)
26
Assessing the Answer – Relevance
� Answer is complete and precise � Q: Who teaches this lecture? � A: Ernesto William De Luca who is working at the
University of Applied Sciences Potsdam, Germany. � Question is partially answered
� Q: Where is Berlin? � A: In Germany.
� Answer suggests a source for more information � Q: What is Information Retrieval? � A: Attend this course.
27
Assessing the Answer – Relevance
� Answer gives background information � Q: What is Information Retrieval? � A: IR is a computer science discipline for
about 60 years. � Answer reminds the user of other relevant
knowledge � Q: What is Information Retrieval? � A: If you are interested in IR, it might be
helpful to have some background knowledge in databases.
28
Relevance for Keyword-based Search
� Simplest form � exact occurrence of the query in the
document � Less restrictive
� single words of the query have to occur often in the document
29
Relevance for Keyword-based Search
� Problems � Missed documents due to synonyms
� “Big Apple” vs. “New York” � “automobile” vs. “car” � “profession” vs. “occupation”
� Irrelevant documents due to ambiguous terms � “bank” (finance institution vs. something to sit on) � “apple” (company vs. fruit) � “bit” (unit of data vs. act of eating)
30
IR is an Iterative Process
� Dialog instead of a single question � The exchange does not (necessarily) end
with first answer � User recognizes elements of a useful answer � Answer changes his knowledge although the
information need is not satisfied yet � User modifies the initial query
� During the search process: � Questions and understanding changes � Information need itself might also change
31
Berrypicking Model (Bates 1989)
� New information may yield new ideas and directions � The information need
� Is not satisfied by a single answer but rather � By a set of information found along the way.
T T
T
T
E
Q0
Q1
Q2
Q3
Q4
Q5
32
IR System Architecture
Information need
query
docu- ments
retrieval system
documentcollection
interface ranking
preprocessing and indexing
33
IR System Architecture
34
Underlying Model
35
Underlying Model
� Fundamental component � Framework for representing
� Queries � Data items and � their relationships
� The “intuition” for ranking � Types of Models
� Boolean model � Vector space model � Probabilistic model � …
36
Internal Data Representation
37
Internal Data Representation
� Represent data such that � Content/meaning is described appropriately � Efficient access based on a query is possible � Memory usage is kept
as small as possible � Types of index
structures � Inverted index � Suffix trees � …
38
Pre-Processing and Indexing
� Transform raw data into internal representation � For documents:
� Interpreting sequences of characters � Recognizing
� Words and phrases � Sentence structures� Part-of-speech
� Syntactical analysis � Morphological analysis � Statistical and linguistic
methods
39
Queries
40
Queries
� Way to express information need � Types:
� Boolean � Natural language � Stylized natural language � Form-based (GUI)
� E.g., Boolean: � Terms
� Words and Phrases
� Operators � AND, OR, NOT
41
Matching and Relevance Ranking � Searches in the internal data storage for
documents matching the query � A relevance metric is used to order the retrieved
document set � Order, e.g.:
� Chronologically � By number of hits of
the query terms � By popularity � Refined metrics
42
Interface and Visualization
� Interaction with the user � Take queries � Visualize results
� Ranked list � Information per document � Structuring of result set � Displaying similarities
� Handle interaction like � Relevance feedback � Query refinement � Filtering
43
Relevance Feedback
� Different possibilities to improve search result � Reranking based on user-marked relevant and
irrelevant documents � Query modifications
� Reformulate entire query � Expansion, e.g., with
synonyms � Refinement, i.e., additional
search on current search results
44
Relevance Feedback
� Filters � Reduce set of candidate results � Often on meta data like
date, domain, file type, author, size, maximum number to retrieve
45
Web Search
� Application of IR on Web documents in the World Wide Web
� Differences to standard IR � Documents must be collected previously
Crawling � Using structure as given through HTML/XML � Documents are not static; change cannot be
controlled � Using the hyperlink structure
46
Web Search System
Information need
query
docu- ments
retrieval system
document collection
spider / crawler
47
Other Tasks Close to IR
� Automated document categorization � Automated document clustering � Automated text summarization � Question answering � Information filtering (spam filtering) � Information extraction � Information integration � Recommending information or products � Searching und ranking in Web 2.0
Related Research Fields
INFORMATION RETRIEVAL
49
Related Research Fields
� Database Management � Library and Information Science � Artificial Intelligence � Natural Language Processing � Machine Learning, Data Mining
50
Database Management
� Focused on � Structured data stored in relational tables � Not on free text
� Deals with efficient handling of well defined queries in a formal language (SQL)
� Clear semantics for data and queries � Currently, it also deals with semi-structured data
like XML this brings databases and IR closer together
51
Library and Information Science
� Focused on � Human-computer-interaction, user interface,
visualization � Organization and search of information in libraries
� Deals with the effective categorization of human knowledge
� Deals with the analysis of the ratio between persons and publications
� Current research in the field of digital libraries brings this field closer to IR
52
Artificial Intelligence
� Focused on methods to acquire, represent, and derive (new) knowledge
� Formalisms to represent knowledge and queries � Predicate logic � Description logics � Bayesian networks
� Current research in the field of the semantic web and ontologies bring a closer relation to IR
53
Natural Language Processing
� Focused on the syntactic, semantic, and pragmatic analysis of natural language text
� This could allow for a search related to meaning instead of keywords
� Natural language processing for IR � Word sense disambiguation: Methods for detecting
the word sense of ambitious words in context � Information extraction: Methods to identify specific
information in text � Methods to answer natural language queries on
document collections
54
Machine Learning, Data Mining
� Focused on the development of systems that can improve their performance based on experience
� Supervised learning: automatic classification through learning from pre-classified training examples
� Unsupervised learning: automatic methods for grouping non-classified examples
55
Machine Learning, Data Mining
� Machine learning for IR � Text categorization
� Automatic classification in hierarchies (e.g., Yahoo) � Adaptive filtering, recommendation � Automatic spam filters
� Text clustering � Clustering of IR query results � Automatic learning of hierarchies
� Text mining � Learning for information extraction
� Learning User preferences � Learning to Rank
Named-Entity Disambiguation
INFORMATION RETRIEVAL
57
Retrieval and Named-Entity Disambiguation
� If newspapers write about an event all over the world, the event stays the same.
� Idea: � Extract knowledge (concepts and entities)
from news articles � Find knowledge in other news articles to
detect � Same articles in other languages � Duplicate articles � Following articles (additional knowledge)
58
is a
is a
Retrieval and Named-Entity Disambiguation
59
Dict- ionaries
document
Entity Recognition Common-
Sense- Ontology
Entity Analysis
WordNet
Lexical Analysis
• synonyms • generalisations • relations
• entities (persons, locations, organizations)
• relations between found entities
Dat
a flo
w
semantic representation
Retrieval and Named-Entity Disambiguation 60
profile matching
clustering
Post-processing
• semantic groups • representatives of these groups
• relevance assumption
• Translation of representations da
ta fl
ow
semantic representation of one document
semantic profile
Retrieval and Named-Entity Disambiguation
Conclusions
INFORMATION RETRIEVAL
62
Summary
� Basic ideas of IR � Overview over
the main components of IR and web search systems
63
Resources
� R. Baeza-Yates, B. Ribeiro-Neto: Modern Information Retrieval, New York, NY: ACM Press; 1999.
� R. Ferber: Information Retrieval. Suchmodelle und Data-Mining-Verfahren für Textsammlungen und das Web. dpunkt-Verl.: Heidelberg, 2003.
� C. D. Manning, P. Raghavan, H. Schütze: Introduction to Information Retrieval, Cambridge University Press, 2008.
� C.D. Manning, H. Schütze: Foundations of Statistical Natural Language Processing, The MIT Press, 2002.
� Proceedings of the 7th European Summer School in Information Retrieval, 2009.
USER MODELING IN KNOWLEDGE MINING
2
Overview
Data and Knowledge Mining User Modeling User Profiles Scenarios
KNOWLEDGE MINING
Ernesto William De Luca
An Introduction
4
Introduction
Today every enterprise uses electronic information processing systems. Production and
distribution planning Stock and supply
management Customer and
personnel management
However: Data alone are not enough. General patterns, structures, regularities go
undetected.
5
Data
Examples of Data “Columbus discovered America in 1492.” “Mr Jones owns a Volkswagen Golf.”
Characteristics of Data refer to single instances
(single objects, persons, events, points in time etc.) describe individual properties are often available in huge amounts (databases, archives) are usually easy to collect or to obtain
(e.g. cash registers with scanners in supermarkets, Internet) do not allow us to make prediction
Knowledge Mining
6
Knowledge
Examples of Knowledge “All masses attract each other.” “Every day at 10:20 am there runs a train from Frankfurt to Darmstadt.“
Characteristic of Knowledge refers to classes of instances
(sets of objects, persons, points in time etc.) describes general patterns, structure, laws, principles etc. consists of as few statements as possible (this is an objective!) is usually difficult to find or to obtain
(e.g. natural laws, education) allows us to make predictions
Knowledge Mining
7
Not all statements are equally important, equally substantial, equally useful Knowledge must be assessed.
Assessment Criteria Correctness (probability, success in tests) Generality (range of validity, conditions of validity) Usefulness (relevance, predictive power) Comprehensibility (simplicity, clarity, parsimony) Novelty (previously unknown, unexpected)
Priority Science:
correctness, generality, simplicity Economy:
usefulness, comprehensibility, novelty
Criteria to Assess Knowledge
Knowledge Mining
8
How do we find knowledge?
We are drowning in information,but starving for knowledge.
John Naisbett
Attempts to Solve the Problems• Intelligent Data Analysis• Knowledge Discovery in Databases• Data Mining
Knowledge Mining
9
Knowledge Discovery und Data Mining
Due to the growing volume of data: Knowledge Discovery in Databases (KDD) is the
non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. [Fayyad et al. 1996] Data Mining is that step of the knowledge
discovery process in which data analysis methods are applied to find interesting patterns.
Knowledge Mining
10
Data Mining Tasks
Classification Is this customer credit-worthy?
Segmentation, Clustering What groups of customers do I have?
Concept Description Which properties characterize fault-prone vehicles?
Prediction, Trend Analysis What will the exchange rate of the dollar be tomorrow?
Dependence/Association Analysis Which products are frequently bought together?
Deviation Analysis Are there seasonal or regional variations in turnover?
Knowledge Mining
11
Data Mining Methods
Classical Statistics(charts, parameter estimation, hypothesis testing, model selection, regression)tasks: classification, prediction, trend analysis
Bayes Classifiers (probabilistic classification, naive and full Bayes classifiers)tasks: classification, prediction
Decision and Regression Trees(top down induction, attribute selection measures, pruning)tasks: classification, prediction
k-nearest Neighbor/Case-based Reasoning(lazy learning, similarity measures, data structures for fast search)tasks: classification, prediction
Artificial Neural Networks(multilayer perceptrons, radial basis functionnetworks, learning vector quantization)tasks: classification, prediction, clustering
Cluster Analysis(k-means and fuzzy clustering, hierarchicalagglomerative clustering)tasks: segmentation, clustering
Association Rule Induction(frequent item set mining, rule generation)tasks: association analysis
Inductive Logic Programming(rule generation, version space, searchstrategies, declarative bias)tasks: classification, association analysis, concept description
Knowledge Mining
12
Supermarkets have a large amounts of customer data available.
Big Interest about Bond Purchases Arrangement of products on shelves
Association Rules to describebond purchases
„If a customer buys bread and wine, he/she will buy in 80% of the cases also cheese.
Data Mining Methods
Example: Association Rules
13
Data Mining Methods
Example: Clustering
Goal: Arrange the given data tuples into classesor clusters.
Data tuples assigned to the same cluster should be as similar as possible.
Data tuples assigned to different clusters should be as dissimilar as possible.
Similarity is most often measured with the help of a distance function. (The smaller the distance, the more similar the data tuples.)
Knowledge Mining
14
Data Mining Methods
Example: Hierarchical Agglomerative Clustering
Centroid (red) Distance between the centroids (mean value vectors) of the two clusters
Average Linkage Average distance between two points of the two clusters.
Single Linkage (green) Distance between the two closest points of the two clusters.
Complete Linkage (blue) Distance between the two farthest points of the two clusters)
Knowledge Mining
15
Data Mining Methods
Example: Hierarchical Agglomerative Clustering
Start with every data point in its own cluster. (i.e., start with so-called singletons: single element clusters)
In each step merge those two clusters that are closest to each other.
Keep on merging clusters until all data points are contained in one cluster.
The result is a hierarchy of clusters that can be visualized in a tree structure (dendrogram)
Knowledge Mining
16
Lesson Learned
Knowledge Mining
Knowledge mining can be simply characterized by the following mapping:
DATA + PRIOR _KNOWLEDGE + GOAL NEW_KNOWLEDGE where GOAL is encoding of the knowledge needs of the user(s), NEW_KNOWLEDGE is knowledge satisfying the GOAL.
Such knowledge can be in the form of data mining methods, statistical summaries, visualizations, natural language summaries, or other knowledge representations.
Ryszard S. MichalskiKnowledge Mining: A Proposed New Direction
Invited talk at the Sanken Symposium on Data Mining and Semantic Web,
Osaka University, Japan, March 10-11, 2003
Knowledge Mining
USER MODELING
Ernesto William De Luca
User Profiles and Scenarios
18
User Modeling
Motivation Intelligent Access to Information Consideration of user's preferences Information Filtering and Processing
User Profile “Modelling a user” Interests, preferences Behavior, patterns of interaction
User Modeling
19
How can we provide individual experience? How can we help users in finding only relevant
information? How can we give personal recommendations? How can we recognize what the user wants to be
recommended?
User Modeling
Problems and Challenges
User Modeling
20
Personalization to understand the user to understand the user needs to find semantically-related content to identify features that influence
the user’s (or item’s) current situation (context)
Knowledge to be used Implicit Knowledge Explicit Knowledge
User Modeling
Goals
User Modeling
21
Observation of the behavior of the user ("Over-the-shoulder look") Inferences about preferences or interaction
patterns Application Flow Collecting samples Identification of common patterns Clustering Building a profile
User Modeling
Implicit Knowledge
User Modeling
22
User Modeling
Explicit Knowledge
Initialization by means of user Specification of rules and / or attributes
Example:Rules forE-Mail Sortingand Filtering
User Modeling
23
PresidentChairmanPerson Chairwomanchairperson officer meetings Organization…
Professorshipchair position professorPedagogy…
chair Furniture Support backarmchair …
chair
User Profile
Semantic Information (World Knowledge)
User Modeling
24
User Profile
Context-aware Information (Situational Knowledge)
User context Surroundings (weather, location) Company (alone, with friends) Mood/emotions any user related factor
User Modeling
25
User Profile
User Characteristics (Level of Knowledge)
User Modeling
26
User Profile
User Characteristics (Culture and Language)
User Modeling
27
Personalization (User Profiles) User behaviour Analysis Semantic Profiles Context-aware Profiles
Knowledge to be used Implicit Knowledge Explicit Knowledge
Scenarios Music, News and Movies
User Modeling
Possible Solutions / Scenarios
User Modeling
28
Goal: Recommend news articles
based on the previous behavior of a user How: User behavior is analyzed Semantic knowledge is being linked
to current news articles. Algorithms were developed to analyze
semantic information and user behavior
Semantic User Profiling
User Modeling - Semantic User Profiles
Music and News Scenario
29
User Modeling - Semantic User Profiles
User Profile Management (UPM)
Our Goals: Understand user interests / needs
Tracking user behaviour on dynamic websites Recommend user-relevant information
Create user profiles Tracking and understanding relations
between content on dynamic websites Including user interests / needs Aggregation of different user profiles from different applications
Semantic User Profiling
30
User Modeling - Context-Aware Profiles
Movie Scenario
Definition: “Context is any information that can be used to characterize the situation of an entity”
[Dey, 2001] Goal : implicit identification of context-related preferences
based on analysis of users’ interaction histories and current usage contexts
How: Key contextual and metadata features are identified and
used for the creation of several sets of user-specific and context-aware recommendations.
Context-aware User Profiling
31
User Modeling - Multilingual User Profiles
Web Scenario32
User Modeling
How can we manage it?
User-centric Recommendation
Multilingualism…
Location Time
Item genre…
Text Audio
PictureVideoMaps
...Personalized Information
Management
Personalized Information
Management
Examples
PERSONALIZEDINFORMATION MANAGEMENT
34
Information Management
Definition: It is the collection and management of information from one or more sources and the distribution of that information to one or more
audiences.
It means the organization of and control over the structure, processing and delivery of information.
Examples: Recommender and Retrieval Systems
35
Information Management
Information
Text
Picture Video
Audio Experts
Maps
…
36
Information Management
User Profiles
…
37
Information Management
Structured Interaction and Retrieval
source: http://web2.wsj2.com/
38
Personalized Information Management
Document-oriented Search
The huge amount of Information available is often distributed across multiple, heterogeneous sources
and must be manually collected and processed.
39
With a Document-oriented Personalized Information Management we can filter the flood of information by:
Intelligent processing of data from heterogeneous sourcesSemantic enrichment and association of collected informationSearch interfaces that are intuitively usable and easy to
masterPersonalized filtering and presentation of information Interactive visualizations of datapersonalized document management recommendations of related information support for collaborative knowledge exchange with other users
Personalized Information Management
Document-oriented Search40
http://www.pia-services.de/
Personalized Information Management
Document-oriented Search - Example
41
Directories
Saved Searches
Newsletters
Query SuggestionsClusters
Tag Cloud
Ratings
Semantic Information
User Profile
Tags
Advanced Search
Paper Details Expert Search
Assistant
Relevance
Personalized Information Management42
Personalized Information Management
Knowledge-driven Search
Spree unlocks this knowledge by:
Identification of expertise in communities and companies. Domain-specific ontology-based modeling and classification
of expertise. Topic-specific classification of user/customer questions. Automatic identification of qualified experts. Communication services for real-time knowledge exchange.
80% of all knowledge is within the minds of people and
therefore difficult to access.
43
Personalized Information Management - Knowledge-driven Search
Searching for Experts
User
Experts
??
44
Personalized Information Management - Knowledge-driven Search
Searching for Experts - Example
45
Questions are automatically analyzed and categorized.
Qualified experts are identified and contacted.
Knowledge transfer happens through chat, blog and email in real-time.
Created knowledgeis easily searchable and accessible.
We can connects users and experts:
Personalized Information Management - Knowledge-driven Search
Searching for Experts - Example Scenario46
Personalized Information Management
Personalized Health Assistance
Developing a health assistant for migrants: prevention service provides
easy access to prevention measures
health information service improves the access to health information
Services are multimodal, context aware and use multilingual
intuitive user interfaces for easy access for users with little technological experience.
47
Personalized Information Management
Personalized Health Assistance – Inf. Services48
Role of Information in the Knowledge Society