13
Automatic hypothesis checking using eScience research infrastructures, ontologies, and linked data: a case study in climate change research Jaakko Lappalainen Computer Science department University of Alcalá, Spain

KREAM@ICCS2013

Embed Size (px)

DESCRIPTION

Presentation made at ICCS2013 Barcelona in June 2013.

Citation preview

  • 1. Jaakko Lappalainen Computer Science department University of Alcal, Spain

2. Overview The problem Proposed approach The method Results Conclusions Strengths and weaknesses Future work Questions 3. The problem Researchers focus on a particular time frame and scope for testing their hypotheses. But the conclusions of the research are projected to the future. Paradox: the work that predicts things for tomorrow, becomes a snapshot of what happened until today. 4. Proposed approach New data relevant to some hypotheses gets continuously aggregated as time passes. With common semantics, it can be combined or related to other datasets. Represent the hypothesis as programs that are executed repeatedly. 5. The method The case of study Lenten, L. J., & Moosa, I. A. (2003). An empirical investigation into long-term climate change in Australia. Environmental Modelling & Software, 18(1), 59-70. The authors claim that the temperature series has some a trend feature. 6. The method (II) Lets find some data sources. ACORN-SAT, from the Australian Bureau of Meteorology. This uses LD!! NOAA weather data, not in LD but easy to parse Periodically ingest data (e.g., into a relational database) An R script checks if the trend on the data has changed Ingested data is semantically tagged 7. Results We are checking for Lenten & Moosas hypothesis every week. More extensive time scope. Wider geographical scope, to all data available for Australia. The snapshot becomes a movie. Executable paper 8. Conclusions The tools we already have allows us to use large-scale computation infrastructures easily to support science. The agINFRA project Massive data ingestion. Data integration and interlinking. User-tailored service execution. 9. Strengths Data availability The data is ingested (from LD sources, but not only) and published. Data interoperability The data is not stored by itself. Actionable data Ready to be addressed, used and generate new actionable data. 10. Weaknesses Represent science inquiry as a data model is not trivial. CPU-consuming tasks are even more consuming. 11. Future work Further dataset interlinking More plural value for physical parameters. Dataset value error detection. Advance in hypothesis representation Machine readable research processes. 12. Questions? 13. Thank you very [email protected]