Utilitarian Aggregation of Open Data
Srinath Srinivasa, Sweety AgrawalChinmay Jog, Jayati Deshmukh
IIIT Bangalore
OSL@IIITB● Started in 2002
● Current strength: 3 PhD students, 5 MS (by Research) students, 24 MTech project students
● Research Graduates: 4 PhDs, 1 MS
● Part of the Intel PlanetLab grid between 2003—2006
● Broad research areas: data, systems and cognition
● Specific research areas over the years
● Data models for graph databases
● Distributed query processing
● Data management in ad hoc networks
● Semantics mining from text data
● Analytics of online social spaces
● Community knowledge management
● Linked Open Data
● Computational Cognition
Current focus areas
OSL ReleasesAgama: A graph database for storing large undirected graphs for efficient traversal (not structurebased retrieval)
Currently Agama powers a cooccurrence graph of all nounphrases from Wikipedia articles hosted in OSL, managing 10s of millions of nodes and 100s of millions of edges
OSL ReleasesTopical Anchors: Given a list of noun phrases, identify a semantic topic for these terms.
Powered by Wikipedia cooccurrence graph hosted by Agama
Web APIs enable use of Topical Anchors in third party applications
OSL ReleasesTopic Expansion: Given a term, expands it into semantically relevant topical clusters with different senses.
Uses co-occurrence datasets from Wikipedia 2006 or 2011.
Web APIs enable use by third party applications
OSL Releases
Silverfish: A social space for managing and discussing research papers
Supports automatic indexing, recommendations and social networking features
Utilitarian Aggregation of Open Data
Open Data
Data hosted publicly for use and republication with a free or open license
Usually comprising of structured datasets in the form of tables
Major government, NGO and corporate players in the open data space
Open Data in India: A Summary [Agrawal et al. 2013]
Sandesh
A “semantic data mesh” over Indian Open Data
Connecting elements from different datasets under an overarching semantic structure
Challenges
Open data about no single topic in particular, fits into no single ontology
Contextual boundaries of open data assertions unable to model using LinkedData standards
The problem of “openended” data
Challenges in Open Data Aggregation
Fragmentation
Challenges in Open Data Aggregation
Bounded validity of utilitarian data
Consider the following RDF statements:
(Einstein , HasWon , NobelPrize) (Wheat , PricePerKilo ,50)
Encyclopedic knowledge
Valid everywhere without contextual boundaries
No immediate or specific utility
Utilitarian knowledge
Valid only within specific contextual boundaries (market, place, time, etc.)
Has immediate and/or specific utility
Challenges in Open Data AggregationThe “divergent” nature of utilitarian aggregation
The “convergent” nature of encyclopedic aggregation like Wikipedia articles
Challenges in Open Data AggregationThe “divergent” nature of utilitarian aggregation
Utilitarian aggregation involves creation of several “utility worlds” each of which combine a given data with different other data sets for different utilitarian goals.
Challenges in Open Data Aggregation
Open Data and Credibility
Open data portals hosting utilitarian data (Ex: Data.gov.in) requires credential checks from data sources for establishing trust, which is not so critical for open data portals hosting encyclopedic data (Ex: Wikipedia).
Challenges in Open Data Aggregation
The problem of “openended” dataData containing private information about entity p, but which may need to be (legitimately) disseminated and used by several entities unrelated to p
Owner (p) of data may not have knowledge or control over consumers of data; but trusts the system to disseminate this data to legitimate consumers
Example case studies:
● ICSE marks data
● BPL data
Many Worlds on a Frame (MWF)A trusted, distributed middleware for utilitarian aggregation
and dissemination of open data
Datasets
MWF
Aggregated knowledge in utilitarian “worlds”
Users
Formal model of MWF developed independently, but representable as a superposition of two Frames in Kripke Semantics
Users as knowledge elements
MWF: Conceptual WorldPerson Place
InstitutionCrop
Conceptual World: A semantic context to host data about something
MWF: Frame
State
is-a
is-in
Every conceptual world has a “type” and a “location” specified by an “isa” parent and “isin” parent respectively. The data structure formed by
isa and isin connections is called the Frame
Place
City
is-a
is-a
is-in
is-in
MWF: World Structure and Participation
Institution
MemberOffice
LocationPerson Place
Components
MemberOffice
LocationHeads
Associations
Member MemberReportsTo
MWF: PrivilegesInstitution
User :: Person
Credentials of a Person (User) defined by the roles played by the Person in different worlds
Credentials determine privilege level in a target world
AdministratorSchema ManagerData owner
Casual userPublic
MWF: Inheritances
State
is-a
is-in
Place
City
isa hierarchy inherits:● World structure● Attributes● Participations● Constraints
isin hierarchy inherits:● Privilege levels● Visibility● Construction● Destruction
MWF: Other Features (ongoing work)
Constraints
Uniqueness constraints
Dual Associations
Bulk loading of data
Cognitive gapfillers
Query semantics● Selectin
Answer a query by matching query condition inside a world and its contained worlds
● SelectonAnswer a query by matching query condition on a set of worlds of a given type
● SelectworldAnswer a query about the participation of a given world in other worlds
MWF: Future Work
Distributed MWF with proxy worlds
From privileges and constraints to an integrity management subsystem
Some Screenshots
Some Screenshots
Some Screenshots
Some Screenshots
Thank You!
References[Agrawal et al, 2013] Sweety Agrawal, Jayati Deshmukh, Srinath Srinivasa, Chinmay Jog, Sri Sayi Bhavani Kakarla, Rahul Dhek, Sneha Deshpande, Sana Javed and Vikas Mohandoss. A Survey of Indian Open Data. Proceedings of IBM ICARE 2013. ACM Press. New Delhi, India. Oct 2013
[Srinivasa et al, 2014] Srinath Srinvasa, Sweety Agrawal, Chinmay Jog, Jayati Deshmukh. Characterizing Open Utilitarian Knowledge. Proceedings of CoDS 2014, New Delhi, India, March 2014.