Upload
cecilia-hardy
View
218
Download
2
Tags:
Embed Size (px)
Citation preview
PerCon : A Personal Digital Library for Heterogeneous Data
Center for the Study of Digital LibrariesDepartment of Computer Science & Engineering
Texas A&M University
2
Outline
• Background
• Motivation
• Objective
• Approach
• System Evaluation
• Results
• Conclusion
• Appendix
3
Buzzword in CS
Big Data
Hadoop
Distributed computing
Machine Learning
Cloud
Multithread
Multicore
Web services
Social network
Platform
Crowdsourcing
Information retrieval
Agile Algorithm
Data science
Data mining
4
Background
• Data explosion / Data-intensive scientific discovery
• Interdisciplinary researches• Advances in devices/sensors, software• …
• More data of more data types
• Demands on collecting, managing, and interpreting heterogeneous data
5
Motivation
• Data management and analysis • Domain-specific representations, visualizations,
interfaces, tools, etc.• Separate “silos” for data of each data type
• Needs for a heterogeneous data environment • Ingesting, processing, and indexing data • Searching, browsing, visualizing, annotating, and
annotating data• Representing and sharing information and
knowledge • Facilitating interactions between a user and a
system with heterogeneous data
6
Objective
• A digital library that supports the collection, management, and interpretation of unanticipated collections of data types
• PerCon: Personalized and Contextual Data Environment• A personal or small group digital library system for
data management and analysis
7
PerCon: Designed Workflow
Data /QueryProcessing
Query Parser
T4T4T3 T3 T2T2T1T1Timestamp:
User Interface
Data Analysis
Database Repository
Application
Data Ingestion
Heterogeneous Scientific Dataset
User
System Resource
User Information/Knowledge SpaceSystem
Web Server
DataProcessing
Domain-dependent feature space
Cross-domain feature space
Personalized feature space
Feature / Knowledge Space : Data flow
: Query flow
8
System Architecture
Resource Layer: Original data objects, computed/filtered datasets, and metadata.
Middleware Layer: Data ingestion, access, automated analysis, visualization, workspace, etc.
Application Layer: User interfaces, external systems access
9
System Interfaces
Menu &Toolbars
Visualworkspace
Repository Viewer
SuggestionHistory
10
Workspace
• Exploration and representation of data with visual and spatial attributes
• Translation of data into information in multiple representations
• Knowledge discovery from information • Data object model : multiple applicable data
visualization
System Base (Object) Panel- Visual and spatial attributes- User expression for data
interpretation
User Application (Object) Panel- Individual visualization/application- Application-specific interaction
11
• Interoperation with history mechanism
• Event records used for mixed-initiative interaction
• Representation of any Java application as individual data objects
Integrated Visual Workspace
12
Mixed-Initiative Interaction
• User-control system
• Menu, toolbar, button, etc.
• System-control system
• Automated system
• Ex) Call center
• Mixed-control system
• Turn-taking & Alternating control
• High computation + high interpretation
• Recommender system
13
Recommendation in PerCon
• Inference of user interests depending user behaviors/events/tasks/goals
• Location and recommendation of related data within the current collection
14
Procedures
1. Building feature space
(Understanding relationships in data)
2. Recording workspace events (history)
3. Inferring user interests
using probabilistic networks
4. Selecting relevant data
5. Recording user’s acceptance/rejection
6. Adopting user feedback
15
Data Analysis Agent
• Internal process for mixed-initiative interaction
Relationship(Similarity, Distance, etc.)
Metadata
AgentUser
Dataset
Inference(Probabilistic)
Network
Suggestion
Feedback
User Events
TrainingInference
Processing
Suggestion History
Feedback Recording
Mixed-initiative interaction
Update
Index
Index
Data Matrix
Data ID TypeDate
User Activity / History
Visual AttributesSpatial Attributes
Annotation
Query Exploration
Workspace Monitoring
Suggestion Request
16
Probabilistic Inference Network
River level – river levelPrecipitation - precipitationRiver level – Precipitation …. . .
Hidden variable
ADD_SYMBOLDELETE_SYMBOLMOVE_SYMBOL RESIZE_SYMBOLCHANGE_BORDER_COLOR …
Data source(s) that a user is interested in
Visual attribute1 ( Background color)
Activity in workspace
Data relationships between data objectsexplored
Data application
PlotTimelineMultimedia playerXML viewerDB viewerCalendar viewer. . .
BlueRedBlackGreen. . .
Visual attribute2 (Border Color)
BlueRedBlackGreen. . .
Data source creation
River level River dischargePrecipitationTemperatureHumidity…
River level River dischargePrecipitationTemperature Humidity…
Annotation
Yes No
Observable variable
17
System Evaluation - User Study
• Hypotheses
• H1: Visual workspace helps a user to manage data and to translate data into
knowledge about the domain
• H2: Mixed-initiative recommendations improve a user’s ability to explore and
analyze data
• 24 Participants
• 1 undergraduate, 4 Masters, 16 PhD students, and 3postdoctoral researchers
• Age from 24 to 36
• Various disciplines
• Computer science, computer engineering, electrical engineering, soil hydrology,
biomedical engineering, industrial engineering, and management information systems
18
Domain Data for Participant Analysis
• Two years of weather and river data (from 2011 to 2013)• Weather data from NOAA
• Temperature, precipitation, relative humidity, wind speed, and wet bulb temperature
• River data from Brazos River Authority in Texas • River level and discharge
• Two equivalent “weather and river” datasets • Dataset 1 collected from College Station, Waco, and Temple• Dataset2 collected from South Bend, Seymour, and Fort Griffin
Upstream
Downstream
19
Tasks
• Task1 (20 minutes): Classifying and organizing data
• Organize and classify river level and precipitation data according to
common trends, quantities, durations, or other user-perceived criteria.
• Task2 (10 minutes): Investigating and identifying data correlation
• Investigate what and how weather factor(s) affects river level.
• Investigate how rivers at different places are correlated.
• Task3 (5 minutes): Interpreting and estimating river data events/causes
• Estimate the (average) time delay regarding the flow if you find any
• Explain the changes considering weather factors and other river stream
flows
20
System Conditions
Configuration 1 Configuration 2
Configuration 3 Configuration 4
WO/ Visual Workspace
WO/ Mixed-Initiative Recommendation W/ Mixed-Initiative Recommendation
W/ Visual Workspace
21
System Conditions
22
Task Procedure (120 minutes)
• Participants learning with a user manual + 10 minute video clip• 5 minutes trial of how to use PerCon• Tasks (Task 1,2, and 3)
Group Subgroup Tasks with dataset 1 Tasks with dataset 2
Group 1A Configuration 1 Configuration 2
B Configuration 2 Configuration 1
Group 2 A Configuration 1 Configuration 3
B Configuration 3 Configuration 1
Group 3A Configuration 1 Configuration 4
B Configuration 4 Configuration 1
Group 4 A Configuration 2 Configuration 3
B Configuration 3 Configuration 2
Group 5A Configuration 2 Configuration 4
B Configuration 4 Configuration 2
Group 6A Configuration 3 Configuration 4
B Configuration 4 Configuration 3
23
Responses to Questions Related to Workspace
Q1 Q2 Q3 Q40
1
2
3
4
5
6
7
Visual Workspace
config 1confg 2config 3config 4Sc
ore
Statements
Q1I had enough support to understand data content in the workspace
Q2 I had enough support to express relationships in the way I wanted
Q3 It was easy to interpret and characterize given/created objects in the workspace
Q4I had enough support to effortlessly / quickly browse and select data
24
Responses to Questions Related to Recommendations
Q5 Q6 Q7 Q80
1
2
3
4
5
6
7
Mixed-initiative interaction
Config 2Config 4
Scor
e
Statements
Q5 I was satisfied with the data suggested
Q6 I was satisfied with the suggestion request
Q7 I had enough support to find and interpret data I was interested in
Q8 I had enough support to find correlations within the dataset
25
Participant Work Practices
Config 1 Config 2 Config 3 Config 40
5
10
15
20
25
30
35
40
45
50
Avg. number of data objects classified/analyzed
Avg. # of data objects
Avg
. # o
f dat
a ob
ject
s
26
Distribution of Activities
• Ordering of user events in repository browser and workspace shows distinct patterns of work
User 1
User 2
User 3
User 4
Config. 1 (without visual workspace) Config. 3 (with visual workspace)
27
A Sequence of Recommendation Events
User1
User2
User3
User4
User5
User6
User7
User8
User9
User10
User11
User12
User-Requested System-Triggered
28
History in PerCon
Storing and processing data Visualizing data
Managing and analyzing data Human activities of locating, annotating, and interpreting data / Data platform
29
Conclusion
• Workspace has a large effect on data (analysis) practice• Recommendation overcomes the difficulty of locating
data for users • Visual workspace
• Facilitates information representation • Aids in identification and interpretation of relationships
between datasets• Helps users learn, solve problems, and make decisions
• Mixed-initiative interaction(recommendation)• Encourages users to explore data • Leads to identify more evidence of correlation among
datastreams• Is valuable for data analysis
30
Future Work
• Improvement of workspace interactions
• More dynamic and tailorable data visualization
• Expansion of recommendation subsystem
• Cross-domain/cross-data-type similarities in the workspace
• Various similarity metrics
• Recommendation algorithms
• Exploration the user of PerCon
• In new domains
• With new user communities.
31
Question?