60
Open Source Tools for Creating Mashups with Government Datasets Mohammed Firdaus, Muhd Sharuzzamal Bakri June 29, 2010 Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Mosc2010 Mashups With Government Datasets

Embed Size (px)

Citation preview

Page 1: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 1/60

Open Source Tools for Creating Mashups withGovernment Datasets

Mohammed Firdaus, Muhd Sharuzzamal Bakri

June 29, 2010

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Page 2: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 2/60

Introduction About the Speakers

About the Speakers

Mohammed Firdaus bin Mohammed Ab Halim

(@firdaus halim) and Muhd Sharuzzamal Bakri (@amai)Founders of Persada Terbilang Sdn Bhd - We have norelationship whatsoever to any fertilizer supplier

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Page 3: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 3/60

Introduction What are Mashups?

Mashups

A mashup is a web page or application that uses and combines data, presentation or functionality from two or more sources to create new services.

(Source: Wikipedia)

Data mashups combine similar types of media and information from multiple sources into a single representation.

(Source: Wikipedia)

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Page 4: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 4/60

Challenges Data Sets are Not Available in Machine Readable Form

Data Sets are Not Available in Machine Readable Form

Nothing useful here:

filetype:csv site:.gov.myfiletype:xml site:.gov.myfiletype:rdf site:.gov.my

We have to resort to web scraping.

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Page 5: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 5/60

Challenges No Data Dictionaries

No Data Dictionaries

Since the data sets that are available were meant for humansto consume rather machines they are usually publishedwithout any type of data dictionary.

This means that an application developer will have to makeassumptions about the structure of each field e.g. whether it’sunique, whether it’s a multi-value field, which fields aremandatory/option.

These assumptions may or may not turn out be correct as yousee more and more data in the data set.

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Ch ll N D S C l B A il bl

Page 6: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 6/60

Challenges New Data Sets Constantly Become Available

New Data Sets Constantly Become Available

This is a not a bad thing.

However, our code, database and schema must be flexibleenough to deal with future data sets that we might want touse in our applications.

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Ch ll L k f St d d A A i

Page 7: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 7/60

Challenges Lack of Standards Across Agencies

Lack of Standards Across Agencies

Different identifiers for referring to the same entity.

The lack of common identifiers makes it tedious to combinedata sets together which maybe describing the same entity.

MyCoID and MyID are steps in the right direction.

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Challenges Summary

Page 8: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 8/60

Challenges Summary

In Summary

Because of these challenges, we need an agile method formodeling, storing and processing these government datasets in

our application.The purpose of this presentation is to show how representingyour data as a graph both help you deal with these challengesand at the same time help make compelling data mashups.

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Graphs Introduction to Graphs

Page 9: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 9/60

Graphs Introduction to Graphs

What is a Graph?

A data structure that consists of a collection of vertices andthe connections between those vertices, called edges.

Vertices are sometimes called nodes or dots.

Edges are sometimes called relationships or edges.

The terminology differs between software packages.

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Graphs Types of Graphs

Page 10: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 10/60

Graphs Types of Graphs

Types of Graphs

A directed graph (or digraph) is one where the edges have adirection (i.e. there’s an outgoing and incoming vertex).

A multigraph is one where multiple edges can exist betweentwo vertices.

An edge-labeled graph is a graph where edges have labels.Similarly, a vertex-labeled graph is one in which the verticeshave labels.

An attributed graph is one in which the vertices and edges canhave attributes (key-value pairs).

A graph can have more than one of these properties e.g. amulti digraph is one which multiple directed edges can existbetween two vertices.

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Graphs Types of Graphs

Page 11: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 11/60

Graphs Types of Graphs

Types of Graphs - Simple/Undirected Graphs

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Graphs Types of Graphs

Page 12: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 12/60

p yp p

Types of Graphs - Directed Graph

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Graphs Types of Graphs

Page 13: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 13/60

p yp p

Types of Graphs - Edge and Node Labeled Graph

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Graphs Types of Graphs

Page 14: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 14/60

Types of Graphs - Multigraph

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Graphs Types of Graphs

Page 15: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 15/60

Types of Graphs - Attributed Multigraph

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Graphs Types of Graphs

Page 16: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 16/60

Examples - Social Graphs

Source: http://www.flickr.com/photos/greenem/11696663/

Undirected Graph - Vertices represent people and edges

represents friendship.Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Graphs Types of Graphs

Page 17: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 17/60

Examples - Web Graph

http://en.wikipedia.org/wiki/File:WorldWideWebAroundWikipedia.png

Multi-digraph - Vertices represent web pages and directed

edges represent links between pages.Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Graphs Property Graphs

Page 18: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 18/60

Property Graphs

’Property graph’ is another term for attributed labeledmulti-digraph.

Property graphs are flexible enough to support most types of 

graph data. Other types of graphs (with the exception of hypergraphs) can be built on top of property graphs byremoving features or using features of the property graph incertain ways.

The tools that we are covering in this presentation deal

primarily with property graphs.

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Graphs Property Graphs

Page 19: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 19/60

Property Graphs

Source: http://wiki.github.com/tinkerpop/gremlin/defining-a-property-graph

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Data Sets Treasury Procurement Data

Page 20: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 20/60

Treasury - Tenders Awarded

Source: http://myprocurement.treasury.gov.my/index.php/en/list-keputusan-tender

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Data Sets Treasury Procurement Data

Page 21: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 21/60

Fields

Tajuk Tender (Title of Tender)

Nombor Tender (Tendor Number)

Kategori Perolehan (Procurement Category)

Kementerian (Ministry)

Petender Berjaya (Winner of Tender)

No Pendaftaran Dengan ROB/ROS/ROC (RegistrationNumber with ROB/ROS/ROC)

No Pendaftaran Dengan MOF/PKK (Registration Number

with MOF/PKK)

Harga Setuju Terima (Agreed Upon Value)

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Data Sets Treasury Procurement Data

Page 22: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 22/60

Code and Data in Machine Readable Form

For this presentation we are using data that we scraped formthis site on 2010-04-26

The source code for our scraper and the CSV dump from2010-04-26 is available athttp://mfirdaus.com/mosc-paper/

The dump contains 2615 records.

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Data Sets Treasury Procurement Data

Page 23: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 23/60

The Dump

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Data Sets Issues with this Data Sets

Mi i Fi ld

Page 24: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 24/60

Missing Fields

Out of the 2615 records in the dump

510 records were missing a tender number

472 records were missing a category

1836 records were missing a ROB/ROS/ROC number

510 records were missing a MOF no

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Data Sets Issues with this Data Sets

T d N b N U i

Page 25: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 25/60

Tender Numbers are Not Unique

32 records have the same tender number and title as anotherrecord

23 records have the same tender number as another record

In some cases these appear to be duplicate records since thefields all match up.

In other cases, one or two fields are slightly differentindicating that there was a probably a typo (erroneous recordwas not deleted).

In some cases, the other fields are completely different which

leads us to think that it’s possible for there to be multiplewinners of a tender (need some government officials to verifythis for us).

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Data Sets Issues with this Data Sets

F t f T d N b

Page 26: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 26/60

Format of Tender Numbers

Examples of tender numbers:

8/2009

PL.(T).08.2009(JKP)

X0141110101090021

128/2009

KBS.S.4-14/69 (T.26/2009)

Probably not a good idea to write code that attempts to parse the

tender number.

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Data Sets Issues with this Data Sets

F t f th ”P t d B j ” Fi ld

Page 27: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 27/60

Format of the ”Petender Berjaya” Field

SYARIKAT PROSPECTRUM SDN BHD

TELEKOM SMART SCHOOL SDN BHD NO.45-8, LEVEL 3,BLOCK C, PLAZA DAMANSARA, JALAN MEDAN SETIA1, BUKIT DAMANSARA 50490 KUALA LUMPUR

1. GLOBAL AEROSPACE SDN BHD (A002) 2. SYSTEMALLIANCE TECHNOLOGY SDN. BHD.(A003) 3. KARISMAWIRA SDN. BHD. (A004) 4. KESUMA TECHNOLOGYSDN. BHD (A005)

A QUALITY REPUTATION SDN BHD B PRIMABUMI SDNBHD

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Data Sets Modeling

M d li g this D t S t s P t G h

Page 28: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 28/60

Modeling this Data Set as a Property Graph

One way to model this data as a graph is to:

Vertices to represent tenders, ministries andcompanies/businesses.

An ”awarded by” labeled edge to associate a tender with aministry.

An ”awarded to” labeled edge to associate a tender with thewinner of the tender (the company/business).

Attributes on tender vertices for the tender title, number,value, category

Attributes on company/business vertices for thecompany/business name, ROB/ROC/ROS registrationnumber and MOF registration number.

Attributes on ministry vertices from the name of the ministry.

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Data Sets Modeling

Example

Page 29: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 29/60

Example

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Graph Databases and Neo4j Neo4j - Introduction

Neo4j

Page 30: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 30/60

Neo4j

Neo4j is a graph database. Persists data in graph form.

Property graph data model with the exception of vertex labels.

In Neo4j terms, vertices are nodes, edges are relationships andattributes are properties.

Property values can be a String or any Java primitive (arraysof these types are supported as well).

Licensed under the AGPLv3. Which basically means that youdon’t need a license if your application is released under a

compatible free software license .For other uses, you need a commercial license from them.

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Graph Databases and Neo4j Neo4j - Introduction

Neo4j

Page 31: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 31/60

Neo4j

Written in Java.

Bindings available for Python, Ruby, Clojure, Erlang, Groovy,Scalan and PHP.

We will be using the Python bindings in this talk.

An embedded database, meaning that it runs in the sameprocess space as the application.

There’s a standalone REST server for those who prefer it.

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Graph Databases and Neo4j Inserting into Neo4j

Initializing the Database

Page 32: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 32/60

Initializing the Database

import neo4j

db = neo4j.GraphDatabase("db")

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Graph Databases and Neo4j Inserting into Neo4j

Creating the Nodes

Page 33: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 33/60

Creating the Nodes

ministry node = db.node(name=ministry, type="ministry")

entity node = db.node(name=entity name, no=entity no,

mof no=entity mof no, type="business entity")

tender node = db.node(no=tender no, title=tender title,category=tender category, value=tender value,

type="tender")

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Graph Databases and Neo4j Inserting into Neo4j

Creating the Relationships

Page 34: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 34/60

Creating the Relationships

tender node.awarded by(ministry node)

tender node.awarded to(entity node)

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Graph Databases and Neo4j Inserting into Neo4j

Indexing Nodes

Page 35: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 35/60

Indexing Nodes

ministries = db.index("ministries", create=True)business entities = db.index("business entities",create=True)tenders by no = db.index("tenders by no", create=True)

tenders by title = db.index("tenders by title", create=True)

tenders by no[tender no] = tender nodetenders by title[tender title] = tender node

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Graph Databases and Neo4j Inserting into Neo4j

The Result

Page 36: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 36/60

The Result

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Graph Traversals

Traversing the Graph

Page 37: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 37/60

Traversing the Graph

Traversing is the process of walking around the graph.

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Graph Traversals

Graph Traversal Options

Page 38: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 38/60

G p Op

Graph Traversal Framework

Gremlin

SPARQL

Manual traversal

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Graph Traversals

Problem

Page 39: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 39/60

Lets use graph traversal to find all the companies who have been

awarded contracts by Kementerian Kesihatan.

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Graph Traversals

Graph Around Kementerian Kesihatan

Page 40: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 40/60

p

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Graph Traversals Traversal Framework

Defining the Traversal

Page 41: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 41/60

g

# Companies who have gotten contracts from a particular ministry # The start node is a ministry class Contractors(neo4j.Traversal):

types = [neo4j.Incoming.awarded by,neo4j.Outgoing.awarded to]

order = neo4j.DEPTH FIRST

stop = neo4j.STOP AT END OF GRAPH

def  isReturnable(self , position):if  position["type"] == "business entity":

return True

else:return False

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Graph Traversals Traversal Framework

Using the Traversal

Page 42: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 42/60

with db.transaction:moh = ministries["KEMENTERIAN KESIHATAN"]

contractors = Contractors(moh)for c in contractors:print c["name"]

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Graph Traversals Traversal Framework

Output

Page 43: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 43/60

RAF SYNERGY SDN BHDPRIMABUMI SDN BHDAVERROES PHARMACEUTICALS SDN BHDQUALITY REPUTATION SDN BHDUNISENDO SDN BHDPRESTIGE PHARMA SDN BHDPHARMANIAGA LOGISTICS SDN BHDIDAMAN PHARMA SDN BHD

PHARMASERV ALLIANCES SDN BHD

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Graph Traversals Traversing Graphs with Gremlin

Gremlin

Page 44: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 44/60

Gremlin is a graph based programming language.

Can express complex graph traversals concisely.

Available athttp://wiki.github.com/tinkerpop/gremlin/

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Graph Traversals Traversing Graphs with Gremlin

Traversing the Graph with Gremlin

Page 45: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 45/60

$ ./gremlin.sh\,,,/

(o o)--–-oOOo-( )-oOOo--–-gremlin> $ := g:key(”ministries”, ”KEMENTERIAN KESIHATAN”)==>v[66]gremlin> ./inE[@label=”awarded by”]/outV/

outE[@label=”awarded to”]/inV/@name==>PHARMASERV ALLIANCES SDN BHD==>IDAMAN PHARMA SDN BHD==>PHARMANIAGA LOGISTICS SDN BHD==>PRIMABUMI SDN BHD==>PRESTIGE PHARMA SDN BHD

==>UNISENDO SDN BHD==>PRIMABUMI SDN BHD==>QUALITY REPUTATION SDN BHD==>AVERROES PHARMACEUTICALS SDN BHD==>PRIMABUMI SDN BHD

..... Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Graph Traversals Traversing Graphs with Gremlin

Explanation

Page 46: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 46/60

./inE[@label=”awarded by”]/outV/outE[@label=”awarded to”]/inV/@name

inE - incoming edges

outV - outgoing vertices

outE - outgoing edges

inV - incoming vertices

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Graph Traversals Traversing Graphs with Gremlin

Explanation

Page 47: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 47/60

./inE[@label=”awarded by”]/outV/outE[@label=”awarded to”]/inV/@name

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Graph Traversals Traversing Graphs with Gremlin

Explanation

Page 48: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 48/60

./inE[@label=”awarded by”]/outV/outE[@label=”awarded to”]/inV/@name

Get current object (.) (the ’KEMENTERIAN KESIHATAN’node).

Get the incoming edges labeled ”awarded by”(inE[@label=”awarded by”]).

Get the outgoing vertices of those edges (outV) (the contractnodes).

Get the outgoing ”awarded to” edges of the contract nodes(outE[@label=”awarded to”]).

Get the incoming vertices of those edges (inV) (the businessentity vertices).

Get the name attributes of those vertices (@name).

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Graph Visualizations Gephi

Gephi

Page 49: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 49/60

Photoshop for graphs.

Supports for various graph layout algorithms.

Graph metrics supported - clustering coefficient. pagerank,diameter, betweeness centrality, closeness centrality

File formats supported - csv, graphml, gexf etc..

http://www.gephi.org

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Graph Visualizations Gephi

Page 50: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 50/60

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Graph Visualizations Gephi

Page 51: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 51/60

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Mashing Up Adding External Data Sources

Mashing Up

Page 52: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 52/60

Lets add shareholding data from Suruhanjaya Syarikat Malaysia(SSM) to the graph so that we can show the tenders that have

been awarded to Telekom Malaysia BERHAD and any of itssubsidiaries/associate companies.

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Mashing Up Adding External Data Sources

Connecting Telekom Malaysia Berhad and Telekom SmartS h l Sd Bhd

Page 53: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 53/60

School Sdn Bhd

telekom = business entities["TELEKOM MALAYSIA BERHAD"]telekom smart school = business entities["TELEKOM SMART SCHOOL SDN

BHD"]

telekom multi media = db.node(name="TELEKOM MULTI-MEDIA SDN BHD",no="345420-H", text="TELEKOM MULTI-MEDIA SDN BHD",type="business entity")

telekom.shareholder in(telekom multi media, units=1650000)

telekom multi media.shareholder in(telekom smart school,units=7650000)

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Mashing Up Adding External Data Sources

Graph Centered at Telekom Malaysia Berhad

Page 54: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 54/60

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Mashing Up Adding External Data Sources

Graph Centered at Telekom Smart School Sdn Bhd

Page 55: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 55/60

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Mashing Up Traversing to Find Direct/Indirect Awards

The Traverser

Page 56: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 56/60

class AllTendersDirectIndirect(neo4j.Traversal):types = [neo4j.Incoming.awarded to,

neo4j.Outgoing.shareholder in]

order = neo4j.DEPTH FIRSTstop = neo4j.STOP AT END OF GRAPH

def  isReturnable(self , position):if  position["type"] == "tender":

return Trueelse:

return False

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Mashing Up Traversing to Find Direct/Indirect Awards

Executing the Traverser and the Output

Page 57: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 57/60

Executing the Traversal Definition

telekom = business entities["TELEKOM MALAYSIA BERHAD"]tenders = AllTendersDirectIndirect(telekom)for tender in tenders:

print tender["no"]

Output

30/200935/2009

8/2009162/2009JASA/OP/1/2009

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Wrapup Making this Easier

Page 58: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 58/60

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Wrapup Making this Easier

Page 59: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 59/60

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data

Wrapup Making this Easier

Page 60: Mosc2010 Mashups With Government Datasets

8/9/2019 Mosc2010 Mashups With Government Datasets

http://slidepdf.com/reader/full/mosc2010-mashups-with-government-datasets 60/60

Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Data