30
ng Web Citations using SPARQL Rules and Social Data – LUPAS2010 Inferring Web Citations using Social Data and SPARQL Rules Matthew Rowe Organisations, Information and Knowledge Group University of Sheffield

Inferring Web Citations using Social Data and SPARQL Rules

Embed Size (px)

Citation preview

Page 1: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Inferring Web Citations using Social Data and SPARQL Rules

Matthew RoweOrganisations, Information and Knowledge Group

University of Sheffield

Page 2: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Outline

• Problem Setting– Personal Information Dissemination

• SPARQL Rules: Identifying Web Citations– Generating Seed Data – Gathering Possible Web Citations– Inferring Web Citations

• Evaluation• Conclusions• Future Work

Page 3: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Personal Information on the Web

• Personal information on the Web is disseminated:– Voluntarily– Involuntarily

• Increase in personal information:– Identity Theft– Lateral Surveillance

• Web users must discover their identity web references– 2 stage process

• Find possible references• Identify definite references

Page 4: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Ambiguity!

Page 5: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Matthew Rowe: Composer

Page 6: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Matthew Rowe: Cyclist

Page 7: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Matthew Rowe: Gardener

Page 8: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Matthew Rowe: Song Writer

Page 9: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Matthew Rowe: PhD Student

Page 10: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Problem Setting

• Performing identification manually:– Time consuming – Laborious

• Handle masses of information– Repeated often

• The Web keeps changing

• Solution = automated techniques– Alleviate the need for humans– Need background knowledge

• Who am I searching for?• What makes them unique?

Page 11: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

SPARQL Rules: Identifying Web Citations

Page 12: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

SPARQL Rules: Identifying Web Citations

Page 13: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Generating Seed Data

• Profiles on Social Web are leveraged as seed data• To generate seed data:

1. Export Social Graphs• Interface with the platform’s API• Convert proprietary response into RDF

– Biographical Information– Social Network Information

2. Enrich Graphs with URIs3. Interlink graphs

• Detect equivalent foaf:Person instances• Builds a single social graph

Page 14: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Generating Seed Data

• Profiles on Social Web are leveraged as seed data• To generate seed data:

1. Export Social Graphs• Interface with the platform’s API• Convert proprietary response into RDF

– Biographical Information– Social Network Information

2. Enrich Graphs with URIs3. Interlink graphs

• Detect equivalent foaf:Person instances• Builds a single social graph

http://www.dcs.shef.ac.uk/~mrowe/foafgenerator.html

Page 15: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Generating Seed Data

• Profiles on Social Web are leveraged as seed data• To generate seed data:

1. Export Social Graphs• Interface with the platform’s API• Convert proprietary response into RDF

– Biographical Information– Social Network Information

2. Enrich Graphs with URIs3. Interlink graphs

• Detect equivalent foaf:Person instances• Builds a single social graph

Page 16: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Generating Seed Data

• Profiles on Social Web are leveraged as seed data• To generate seed data:

1. Export Social Graphs• Interface with the platform’s API• Convert proprietary response into RDF

– Biographical Information– Social Network Information

2. Enrich Graphs with URIs3. Interlink graphs

• Detect equivalent foaf:Person instances• Builds a single social graph

1. Blocking Step2. Compare values of Inverse

Functional Properties3. Compare Geo URIs4. Compare Geo data

Page 17: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Generating Seed Data

• Profiles on Social Web are leveraged as seed data• To generate seed data:

1. Export Social Graphs• Interface with the platform’s API• Convert proprietary response into RDF

– Biographical Information– Social Network Information

2. Enrich Graphs with URIs3. Interlink graphs

• Detect equivalent foaf:Person instances• Builds a single social graph

Page 18: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

SPARQL Rules: Identifying Web Citations

Page 19: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Gathering Possible Web Citations

• Search WWW and Semantic Web for possible citations• Web resources come in many flavours:

– Data Models, HTML documents, XHTML documents• Convert into RDF

– XHTML Documents:• Use GRDDL• Automated RDF model lifting

– HTML Documents:• Apply person name gazetteer: identify person information• Apply Hidden Markov Model to extract information• Build RDF model from information

M Rowe. Data.dcs: Converting Legacy Data into Linked Data. In proceedings of Linked Data on the Web Workshop, WWW 2010. Raleigh, USA. (2010)

Page 20: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

SPARQL Rules: Identifying Web Citations

Page 21: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Inferring Web Citations using SPARQL Rules

• Seed data = solitary example to build rules– State of the art rule induction strategies are limited

• E.g. FOIL and C4.5– Build rules from RDF instances!

1. Extract instances from Seed Data2. For each instance, build a rule:

– Build a skeleton rule– Add triples to the rule– Create a new rule if a triple’s predicate is Inverse Functional

3. Apply the rules to the web resources

Page 22: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Inferring Web Citations using SPARQL Rules

• Seed data = solitary example to build rules– State of the art rule induction strategies are limited

• E.g. FOIL and C4.5– Build rules from RDF instances!

1. Extract instances from Seed Data2. For each instance, build a rule:

– Build a skeleton rule– Add triples to the rule– Create a new rule if a triple’s predicate is Inverse Functional

3. Apply the rules to the web resources

Page 23: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Inferring Web Citations using SPARQL Rules

• Seed data = solitary example to build rules– State of the art rule induction strategies are limited

• E.g. FOIL and C4.5– Build rules from RDF instances!

1. Extract instances from Seed Data2. For each instance, build a rule:

– Build a skeleton rule– Add triples to the rule– Create a new rule if a triple’s predicate is Inverse Functional

3. Apply the rules to the web resources

PREFIX foaf:<http://xmlns.com/foaf/0.1/>CONSTRUCT { <http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:page ?url }WHERE {

<http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:name ?n .?url foaf:topic ?p .?p foaf:name ?n

}

Page 24: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Inferring Web Citations using SPARQL Rules

• Seed data = solitary example to build rules– State of the art rule induction strategies are limited

• E.g. FOIL and C4.5– Build rules from RDF instances!

1. Extract instances from Seed Data2. For each instance, build a rule:

– Build a skeleton rule– Add triples to the rule– Create a new rule if a triple’s predicate is Inverse Functional

3. Apply the rules to the web resources

PREFIX foaf:<http://xmlns.com/foaf/0.1/>CONSTRUCT { <http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:page ?url }WHERE {

<http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:name ?n .?url foaf:topic ?p .?p foaf:name ?n .<http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:knows ?q .?q foaf:name ?m .?url foaf:topic ?r .?r foaf:name ?m

}

Page 25: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Inferring Web Citations using SPARQL Rules

• Seed data = solitary example to build rules– State of the art rule induction strategies are limited

• E.g. FOIL and C4.5– Build rules from RDF instances!

1. Extract instances from Seed Data2. For each instance, build a rule:

– Build a skeleton rule– Add triples to the rule– Create a new rule if a triple’s predicate is Inverse Functional

3. Apply the rules to the web resources

PREFIX foaf:<http://xmlns.com/foaf/0.1/>CONSTRUCT { <http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:page ?url }WHERE {

<http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:name ?n .?url foaf:topic ?p .?p foaf:name ?n .<http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:knows ?q .?q foaf:homepage ?h .?url foaf:topic ?r .?r foaf:homepage ?h

}

Page 26: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Inferring Web Citations using SPARQL Rules

• Seed data = solitary example to build rules– State of the art rule induction strategies are limited

• E.g. FOIL and C4.5– Build rules from RDF instances!

1. Extract instances from Seed Data2. For each instance, build a rule:

– Build a skeleton rule– Add triples to the rule– Create a new rule if a triple’s predicate is Inverse Functional

3. Apply the rules

PREFIX foaf:<http://xmlns.com/foaf/0.1/>CONSTRUCT { <http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:page ?url }WHERE {

<http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:name ?n .?url foaf:topic ?p .?p foaf:name ?n .<http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:knows ?q .?q foaf:homepage ?h .?url foaf:topic ?r .?r foaf:homepage ?h

}

Page 27: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Evaluation

• Measures:– Precision, Recall, F-Measure

• Dataset– 50 participants from the Semantic Web and Web 2.0 communities– Seed data collected from Facebook and Twitter– ~17300 web resources: 346 web resources for each participant

• Baselines– Baseline 1: Person name as positive classification

• Skeleton SPARQL Rule– Baseline 2: Human Processing

Page 28: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

ResultsPrecision Recall F-Measure

Inference Rules 0.955 0.436 0.553Baseline 1 0.191 0.998 0.294Baseline 2 0.765 0.725 0.719

• High precision– Better than humans– Triple Patterns

• Low recall– Rules are strict

• No room for variability– Hard to generalise

• No learning from disambiguation decisions

Page 29: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Conclusions

• SPARQL Rules are precise– Poor generalisation however– Outperform humans at low web presence levels

• “Needle in a haystack problem”

• User profiles provide seed data– Inexpensively– Capturing:

• Biographical information• Social networking information

• Inability to learn from identifications– Plan for future work– Overcome poor seed data feature coverage

Page 30: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Questions?

Twitter: @mattroweshowWeb: http://www.dcs.shef.ac.uk/~mroweEmail: [email protected]

M Rowe and F Ciravegna. Disambiguating Identity Web References using Web 2.0 Data and Semantics. In Press for special issue on "Web 2.0" in the Journal of Web Semantics. (2010)

For more information: