1
Mashroom: End-User Mashup Programming
Using Nested Tables
Guiling Wang, Shaohua Yang, Yanbo HanInstitute of Computing Technology (ICT)
Chinese Academy of Scienceshttp://sigsit.ict.ac.cn
Email: [email protected]
INSTITUTE OF COMPUTING TECHNOLOGY, CAS
2
Outline• Introduction• Foundamentals• Programming Model
– Application Structure– Data Model– Script & Operators
• Trail Application• Evaluation• Related Works• Conclusion
3
Motivation
• HTML, RSS/Atom, Open API • Mashup
4
Motivation• Seldom can a single website provide an integr
ated view of movies on demand– title, director, showtime, reviews, map, …
Showtimes Websitehttp://www.google.com/movieshttp://www.imdb.com…
Movie reviews Websitehttp://www.mrqe.comhttp://www.douban.com,...
5
Motivation• End-User Mashup Programming
– Enable non-professional users to build Web applications by combining functionalities offered by more than one websites to deal with situational and ad-hoc problems
• Popular mashup editors adopt flow-chart-like formalisms like Yahoo! Pipes, MS Popfly, …– Some studies show that the concept of data flow i
s the main barrier for end users [Wong,07]• Explicit control operators: branches, loops, …• Explicit alignment of inputs and outputs at each step
6
Problem Definition• Spreadsheet programming can also been adop
ted in end-user mashup programming like C3W, SpreadMash, … – Excellent end user programming paradigm, but als
o face some challenges• Goal
– Enable non-professional users to build mashups – Adopt spreadsheet programming paradigm– Balance usability and expressiveness
7
Outline• Introduction• Foundamentals• Programming Model
– Application Structure– Data Model– Script & Operators
• Trail Application• Evaluation• Related Works• Conclusion
8
Foundamentals of Mashroom Programming
• Learn from nested relation model and nested table– Nested relation model is one of the most adopted data mo
del for representing web data. – Strong foundation both in query algebra and in query opti
mization– Nested table is understandable by end users
• Challenges– existing graphical query languages based on the nested ta
ble are low-level • SQL-like, QSByE [Filha,01] and GXQL [Qin,04] • Data query operators are not enough
9
Foundamentals of Mashroom Programming
• Learn from Spreadsheet– Loop execution is described by selecting a range
of cells intuitively– Allows users to manipulate data by flexibly editing
a formula• Challenges
– Enable spreadsheet suitable for displaying nested table
– Enable direct manipulation on the nested relational data model
10
Foundamentals of Mashroom Programming- Learn From Programming by Example
Wudao-kou
MEGABox
Chengfu Road
Zhongg-uan Cun
Waiting In Beijing
Lun Zhang
Oct 24th
WantedTimur
Bekmambetov
Oct 9th
… … …
… … …
Waiting In Beijing
Lun Zhang Oct 24th
Theaters
MoviesTheatername
Address Label
Director
On Show Time
Wudao-kou
MEGA
Box
Chengfu Road
Zhongg-uan Cun
Waiting In Beijing
Lun Zhang
Oct 24th
WantedTimur
Bekmambetov
Oct 9th
… … …
… … …
Waiting In Beijing
Lun Zhang Oct 24th
Theaters
MoviesTheatername
Address Label
Director
On Show Time
Mashroom GUI (worksheets)
Mashup Script
…
Mashup Engine
Wudao-kou
MEGABox
Chengfu Road
Zhongg-uan Cun
Waiting In Beijing Lun Zhang Oct 24th
Wanted Timur Bekmambetov
Oct 9th
… … …
… … …
Waiting In Beijing Lun Zhang Oct 24th
Theaters
MoviesTheatername
AddressLabel Director On Show Time
build-time
run-time
record user instructions
Nested Relational Data
…
11
Outline• Introduction• Foundamentals• Programming Model
– Application Structure– Data Model– Script & Operators
• Trail Application• Evaluation• Related Works• Conclusion
12
Application Structure
Presentation Views
Data Services/Composite Data
Services
Widget/Mashup Application
process/compose
Assoc-iate
config
wrapREST
RSS
SOAP
HTML
Mashup View/IDE
Web Sources
13
Data Model : MashSheet
σ(Theaters(MoviesLabel=”Waiting in Beijing”))
• MashSheet = {worksheet} • a worksheet is a nested table• “column” is the first-class object• Mashroom formula is built from column references, operators, and constant valu
es
Wudao-kou
MEGABox
Chengfu Road
Zhongg-uan Cun
Waiting In Beijing Lun Zhang Oct 24th
WantedTimur
BekmambetovOct 9th
… … …
… … …
Waiting In Beijing Lun Zhang Oct 24th
Theaters
MoviesTheatername
AddressLabel Director On Show Time
Nested T
able S
chema
Nested T
able Instances
14
Data Mashup Script and Operators
• script = <vars, ops>– vars : global input parameters
• op = <actor, actorIn>– actor : computation that computes on nested tabl
es from actorIn– actorIn : a set of input variables
15
Operators in Mashroom
Operators Operations on Nested Relation Model
Worksheet Creation Import, CreateSheet New, Insert
Worksheet Data Manipulation
Filter, Sort, HeaderTruncate, TailTruncate
Selection(σ)
Worksheet Schema Manipulation
DeleteColumn, RenameColumn, Nest/Unnest
Delete,Update, Nest(η),Unnest(μ)
Worksheet Cleaning AddFunction, Replace, MergeInstance
Insert, Selection(σ)
Worksheet Composition
Merge, Fuse, LinkService Union( ),Join( )∪
Worksheet Export Sink -
16
Worksheet Creation Operators(Import, CreateSheet)
createsheet(sheet0:Theaters/Movies)
<createSheet ref-datasheet=“sheet0" ref-column="sheet0:Theaters/Movies“/>
Wudao-kou
MEGABox
Chengfu Road
Zhongg-uan Cun
Waiting In Beijing Lun Zhang Oct 24th
WantedTimur
BekmambetovOct 9th
… … …
… … …
Waiting In Beijing Lun Zhang Oct 24th
Theaters
MoviesTheatername
AddressLabel Director On Show Time
Waiting In Beijing Lun Zhang Oct 24th
WantedTimur
BekmambetovOct 9th
… … …
… … …
Waiting In Beijing Lun Zhang Oct 24th
Movies
Label Director On Show Time
CreateSheet
sheet0sheet1
17
Worksheet Data Manipulation Operators(Filter, Sort, HeadTruncate, TailTruncate)
<filter ref-column="sheet0:Book "><or>
<atom rop="LESS"><left style="COLUMN" value="sheet0:Book/price" /><right style="CONSTANT " value="30" />
</atom><atom rop="GREATER">
<left style="COLUMN" value="sheet0:Book/price" /><right style="CONSTANT " value="80" />
</atom></or>
</ filter>
Head First Java Kathy Sierra $29.67
Effective Java Joshua Bloch $38.99
Core Java
Java How to Program
Harvey M. Deitel $106.20
Book
Title Author Price
sheet0
Cay S. Horstmann $37.79
Filter
Head First Java Kathy Sierra $29.67
Book
Title Author Price
sheet0
Java How to Program
Harvey M. Deitel $106.20
filter(sheet0:Book/Price, (sheet0:Book/Price < 30 ) or (sheet0:Book/Price > 80))
18
Wudao-kou
MEGABox
Chengfu Road
Zhongg-uan Cun
Waiting In Beijing Lun Zhang Oct 24th
WantedTimur
BekmambetovOct 9th
… … …
… … …
Waiting In Beijing Lun Zhang Oct 24th
Theaters
MoviesTheatername
AddressLabel Director On Show Time
sheet0
Wudao-kou
MEGABox
Chengfu Road
Zhongguan Cun Waiting In Beijing Lun Zhang Oct 24th
Wanted Timur Bekmambetov
Oct 9th
… … …
… … …
Waiting In Beijing Lun Zhang Oct 24th
Theaters
Theatername
Address Label Director On Show Time
sheet0
MEGABox
MEGABox
Zhongguan Cun
Zhongguan Cun
unNest
Wudao-kou
MEGA Box
Chengfu Road
Zhongguan Cun
Waiting In Beijing Lun Zhang Oct 24th
WantedTimur
BekmambetovOct 9th
… … …
Theaters
Theatername Address
Label Director On Show Time
sheet0nest
Wudao-kou
Chengfu Road
MEGA BoxWudao-
kouChengfu
Road
Zhongguan Cun
Theaters
rename
Wudao-kou
MEGA Box
Chengfu Road
Zhongguan Cun
Waiting In Beijing Lun Zhang Oct 24th
WantedTimur
BekmambetovOct 9th
… … …
Movies
Theatername Address
Label Director On Show Time
sheet0
Wudao-kou
Chengfu Road
MEGA BoxWudao-
kouChengfu
Road
Zhongguan Cun
Theaters
Worksheet Schema Manipulation Operators(DeleteColumn, RenameColumn, Nest/Unnest )
<unnest ref-column="sheet0:Theaters/Movies" /><nest name ="Theaters">
<column ref-column="sheet0:Theaters/name" /><column ref-column="sheet0:Theaters/address" />
</nest ><renameColumn ref-column="sheet0:Theaters" new-name="Movies" />
19
Worksheet Cleaning (mergeInstance, addFunction, replace)
<replace ref-column="sheet0:Book/Price2"><funcEx func=“STRCAT">
<atomEx style=“CONSTANT" value=“$" /> <atomEx style="COLUMN" value="sheet0:Book/price2" />
</funcEx></addFunction>
Head First Java Kathy Sierra $29.67
Effective Java Joshua Bloch $38.99
Core Java
Java How to Program
Harvey M. Deitel $106.20
Book
Title Author Price1
sheet0
Cay S. Horstmann $37.79
Replace
Price2
29.00
40.90
99.20
38.00
Head First Java Kathy Sierra $29.67
Effective Java Joshua Bloch $38.99
Core Java
Java How to Program
Harvey M. Deitel $106.20
Book
Title Author Price1
Cay S. Horstmann $37.79
Price2
$29.00
$40.90
$99.20
$38.00
sheet0
20
Worksheet Composition (merge, fuse, linkService)
<linkService ref-column="Sheet0:Book" ref-service=“BookReviewService"><mapping param-name="booktitle" style="COLUMN" value ="Sheet0:Book/title"/>
</linkService>
linkservice(sid, mapping,…)
Waiting In Beijing Lun Zhang Oct 24th
WantedTimur
BekmambetovOct 9th
Movies
Label Director On Show Time
sheet0
linkService
Waiting In Beijing Lun Zhang Oct 24th
WantedTimur
BekmambetovOct 9th
Movies
Label Director On Show Time
sheet0
ReviewsAuthor Score
… …
alice
bob
5 star
4 star
21
Operator Visualization
• Nested table
pop-up menu for relation or sub-relationpop-up menu for
atomic attribute
pop-up menu for new column
Drag and Drop Add a New Column
Sch
ema
Co
nten
t
atomic attribute relation
22
Operator Visualization : Drag and Drop• create a new worksheet• merge
Items(10 items)
Google Search Results(sheet0)
Yahoo Search Results(sheet1)
drag and dropdrag and dropto blank space
23
Operator Visualization
• Implicit control operators: loops, …• Avoid alignment of inputs and outputs at each
step
(a) (b)
24
Outline• Introduction• Foundamentals• Programming Model
– Application Structure– Data Model– Script & Operators
• Trail Application• Evaluation• Related Works• Conclusion
25
Trail Application
• IMDBDS (from imdb.com)– Output : (label, director, actor)
• GooglemovieDS (from shenghuo.google.cn)– Input: city– Output : (title, director, actor)
• MovieReviewsDS (from douban.com)– Input: subjectID– Output: reviewLink, summary
• How to get the integrated movie list with reviews ?
26
Trail Application(1)
Waiting In Beijing Lun Zhang Steph Song
WantedTimur
Bekmambetov
… … …
… … …
Movies
Label Director actor
sheet0
…
…
State of PlayKevin
Macdonald Russell Crowe
WantedTimur
Bekmambetov
… … …
… … …
Movies
Title Director actor
sheet1
…
GoogleMovieDS IMDBDS
sheet0
Merge, MergeInstance, DeleteColumn, ...
Waiting In Beijing Lun Zhang
WantedTimur
Bekmambetov
… …
Movies
Label Director
State of Play Kevin Macdonald
… …
27
Trail Application(2)
sheet0
State of Play Http://douban.com/1239
Play Http://douban.com/3432
… …
… …
MovieSearchResult
Title subjectID
sheet1
MovieSearchDS, title = “ State of Play”
linkService
Waiting In Beijing Lun Zhang
WantedTimur
Bekmambetov
Movies
Label Director
State of PlayKevin
Macdonald
sheet0
State of Play Http://douban.com/1239
Play Http://douban.com/3432
… …
… …
MovieSearchResult
Title subjectID
Waiting In Beijing Lun Zhang
WantedTimur
Bekmambetov
Movies
Label Director
State of PlayKevin
Macdonald Http://douban.com/1239
…
…
MovieSearchResult
subjectID
Filter
Waiting In Beijing Lun Zhang
WantedTimur
Bekmambetov
… …
Movies
Label Director
State of Play Kevin Macdonald
sheet0
28
Trail Application(3)
Waiting In Beijing Lun Zhang
WantedTimur
Bekmambetov
Movies
Label Director
sheet0
ReviewsAuthor Score
… …
alice
bob
5 star
4 star
Sink
linkService
State of PlayKevin
Macdonald
Alice 5 Star
Bob ...
… …
… …
MovieReviews
Author Score
sheet2
MovieReviewDS, subjectid = http://douban.com/1239
Waiting In Beijing Lun Zhang
WantedTimur
Bekmambetov
Movies
Label Director
State of PlayKevin
Macdonald Http://douban.com/1239
…
…
MovieSearchResult
subjectID
sheet0
http://douban.com/1239
……
subjectID
…
29
1) actor:ImportactorIn:GoogleMovieDS<city,"Beijing",MASHUP_PARAM>
5) actor:FilteractorIn:Items/Items/title,contains,Items/Title
4) actor:LinkServiceactorIn:MovieSearchDS,<$title, items/title>
6) actor:LinkServiceactorIn:MovieReviewsDS<$subjectID, Items/Items/subjectID>
7) actor:SinkactorIn:"listview"=>NewestMovieList.widget
2) actor:ImportactorIn:IMDBDS
3) actor:MergeactorIn:sheet1.Items,sheet2.Items,<sheet1.Items/title, sheet2.Items/label>
Google∪ Imdblabel=title
Sheet1:G(title,director,actor)
Sheet2:I(label, director,actor)
MovieSearchResult(subjectID,title) abbreviate as M(subjectID,title)
σ (Sheet1)(Items/title == title)
ReviewsSearchResult(subjectID,reviewLink,summary) abbreviate as R(subjectID,reviewLink, summary),
(Sheet1, M)
Sheet1: G(title,director,actor)
Sheet1: G(title,director,actor,M(subjectID,title))
Sheet1: G(title,director,actor,(subjectID,title))
(Sheet1(M), R)
Sheet1: G(title,director,actor,M(subjectID,title, R(reviewLink, summary)))
Demo Video
30
Evaluation
• Goal– expressivity and usability
• Method– 10 popular mashups from Yahoo! Pipe community (those h
ave high “clone” number), two mashup directories (programmableWeb.com and mashupAwards.com) and a set of other typical mashups.
– “Mashup Patterns”– User study with 5 groups of people to measure the averag
e time to build a mashup
31
Aggregation Patterns• Aggregation for Collection (same kind of Web Sources)
– e.g. “Aggregated News Alerts”, search at Bloglines, Google Blog Search, Microsoft Live News etc.
• Aggregation for Comparison (same kind of Web Sources)– e.g. “Book Price Comparison”, compares the price of the same book fr
om dangdang.com and amazon.cn • Focus View or Data Analysis (single Web Source)
– e.g. “Focus View of YouTube Video”, list YouTube videos of a certain category or tag
• Aggregation for Collection (different kinds of Web Sources with dependency)– e.g. “Aggregated Movie Reviews”
32
• “similarity aggregation without dependency” pattern
1) actor:ImportactorIn:DataService-A<paramA,"value", MASHUP_PARAM>
4) actor: Worksheet Manipulation & Cleaning Operators (Filter, Sort, MergeInstance…)
5) actor:Sink
2) actor:ImportactorIn:DataService-B<paramB,paramA,MASHUP_PARAM >
3) actor:Merge or FuseactorIn:sheet1.A,sheet2.B,<sheet1.A/C1, sheet2.B/D1>…
Sheet1:A(C1,C2,…)
Sheet2:B(D1, D2, …)
…
33
• “similarity aggregation with comparison” pattern
1) actor:ImportactorIn:DataService-A<paramA,"value", MASHUP_PARAM>
actor: AddFunctionactorIn: e.g.COPY ( $B/price) + “ B”
5) actor:Sink
actor:ImportactorIn:DataService-B<paramB, paramA,
MASHUP_PARAM>
3) actor:Merge or FuseactorIn:sheet1.A,sheet2.B
Sheet1:A(C1,C2,…)
Sheet2:B(D1, D2, …)
…
…2) actor: AddFunctionactorIn: e.g.COPY ( $A/price) + “ A”
4) actor: Sort by StringactorIn: A.C1
34
• “focus view or analysis” pattern
1) actor:ImportactorIn:DataService-A<param1,"value1",MASHUP_PARAM>
4) actor: Worksheet Manipulation & Cleaning Operators (Filter, Sort, MergeInstance…)
5) actor:Sink
2) actor:ImportactorIn:DataService-A<param2,"value2",MASHUP_PARAM>
3) actor:Merge or FuseactorIn:sheet1.A,sheet2.A,
Sheet1:A(C1,C2,…)
Sheet2:A(C1, C2, …)
…
35
• “aggregation with dependency” pattern• “search subjectID first” pattern
1) actor:ImportactorIn:DataService-A<paramA,"value", CONSTANT or MASHUP_PARAM>
4) actor: Filtering Operators (Filter, Truncate)
6) actor:Sink
2) actor:ImportactorIn:DataService-B<paramB,paramA,MASHUP_PARAM> or<paramB,"value",CONSTANT or MASHUP_PARAM>
3) actor:Merge or FuseactorIn:sheet1.A,sheet2.B,<sheet1.A/C1, sheet2.B/D1>…
Sheet1:A(C1,C2,…)
Sheet2:B(D1, D2, …)
…
5) actor:LinkServiceactorIn:DataService-Ce.g. <$P, C1>
6) actor:LinkServiceactorIn:DataService-De.g. <$P, D1>
1) actor:ImportactorIn:SearchSubjectID
3) actor: FilteractorIn:Sheet1.A,<A/SearchResults/Title, contains, A/Title>
5) actor:Sink
2) actor:LinkServiceactorIn:SearchSubjectID<$P, sheet1.A/Title>
Sheet1:A(Title,…)
4) actor:LinkServiceactorIn:SearchContentse.g. <$P, A/SearchResults/subjectID>
36
Selected Mashups, Patterns and Experiment Results
37
Related Works• flow-chart-like programming
– Microsoft Popfly, IBM Damia, Marmite Yahoo Pipes – Provide a set of flow-chart-based graphical operators and adopt a flo
w-style orchestration specification• spreadsheet-like programming
– SpreadMash and C3W – Data services are processed in a manner similar to spreadsheets
• tree-based programming– The old version of Intel MashMaker – Data services are combined based on a tree structure
• browser-centric programming– Ubiquity and d.mix– Provide mechanisms for users to trigger the mashup operations in the
context of browsing
38
Related Works
flow-chart-like programming(Yahoo! Pipe)
spreadsheet-like programming(C3W)
Tree-based programming(old version MashMaker)
Mashroom
Knowledge required,learning curve
Data flow modeling knowledge
Spreadsheet using knowledge
Specific tree-based programming knowledge
a little spreadsheet knowledge
Service dependency
Alignment of Input and Outputs in data flow
Cell reference in spreadsheet
Parameter configuration
Contextual menu and dialog
Complex control operator
Explicit control module
Control expressions on cell reference
Hide complex control operator
Hide complex control operator
39
Conclusion
• A programming model for building mashups by end users– the design philosophy– the abstraction of mashup applications, and the i
mplementation – combine the nested table with the spreadsheet-lik
e programming – Evaluation