21
R e D R e S S Case Study in e-Social Science Building Collaborative e-Research Environments JISC Consultation Workshops, 23/2/04 and 5/3/04 Rob Allan (CCLRC Daresbury Laboratory) Rob Crouchley (University of Lancaster)

R e D R e S S Case Study in e-Social Science Building Collaborative e-Research Environments JISC Consultation Workshops, 23/2/04 and 5/3/04 Rob Allan (CCLRC

Embed Size (px)

Citation preview

R e

D R

e S

S Case Study in e-Social Science

Building Collaborative e-Research Environments JISC Consultation Workshops, 23/2/04 and 5/3/04

Rob Allan (CCLRC Daresbury Laboratory)

Rob Crouchley (University of Lancaster)

R e

D R

e S

SSpecific Social Scientists Problems

1. They have much less experience and expertise in the use of the Grid than those typically from other research council areas;

2. There is a significant intellectual gap between such disciplines and computer science;

3. Distributed systems are also inherently complex and associated middleware products are not easy to use;

4. The Open Middleware Infrastructure Institute (OMII) is likely to provide generic (open-source) middleware and associated services.

E-Science middleware currently not specifically targeted for the social science community.

R e

D R

e S

SSocial Scientists Need

1. Help to develop a more computer-literate collaborative culture;

2. Help to develop component-based software, visual composition tools and scripting languages which are easy to use;

3. To exploit state-of-the-art software development technologies such as aspect-oriented programming to enhance flexibility.

Middleware could be the catalyst for re-use and sharing in the e-Social Sciences. Some examples and ideas follow.

R e

D R

e S

SSome Features of Social Science

Research

• Research motivated by a desire to determine causality• Involves

1. identifying the various factors which influence the behaviour or outcome of interest and quantifying their effects;

2. controlling for all the different confounding factors which would otherwise result in spurious relationships and misleading results.

• Randomised experiments not feasible, we cannot randomly allocate individuals to different levels of training in order to evaluate programs.

• We rely on observational data, i.e. data that have been obtained from surveys and censuses.

This is different to “exact sciences” like physics and chemistry where repeatable experiments can be performed.

R e

D R

e S

S3 related Aspects of Soc. Sci. Research

Observational Data, usually full of holes-missing data-measurement error-dropout

Substantive Theory-what determines what-not comprehensive-often contradictory

Methodology-only partially developed

R e

D R

e S

SSoc. Sci. needs Comprehensive Models

• Interdependent sub models, we need joint models for the data complexities and the core processes we want to understand

• Models are not linear in the parameters, require special procedures and are highly computationally intensive due to the high dimensionality and the interdependent sub models.

• Simple analyses are usually very misleading about the role of the controls, eth, sex etc.

Soc. Sci. research is complex - large parameter space, many interpretations and models which need to be tested. Cannot be done in isolation…

Increasing need to link components and access large computers/ data sets from desktop.

R e

D R

e S

S

DataManagement

A

DataManagement

B

DataManagement

C

Analysis A Analysis B Analysis C

Middleware

E-Science Technology can link Components!

R e

D R

e S

SNew Tools: The Analysis Cycle

Main ESDS Data Sets

Select Data Set and Appropriate Variables:

TTWA Data, NOMIS

Merge Files: Add Variables

Working Data

Contextual Data

Results

R e

D R

e S

SNew Tools: Simultaneous Analysis

National Pupils Database

Psychologists Analysis

Geographers Analysis AnalysisLocational Analysis B

Economists Analysis

Educationalists Analysis

Example: research in educational attainment

R e

D R

e S

SE-Science can enhance Collaboration!

• Particularly important in qualitative research;• Enable comparison of different markup/ interpretation;• Direct access to datasets for validation;• Direct input of data from fieldwork involving

questionnaires, photography etc.• Delivery/ input devices (some mobile) may include:

portals, Access Grid, PC tablets, PDA, camera, phone etc.

R e

D R

e S

S

VideoCorpus

Researcher A

Researcher B

Researcher C

VIDGRID: Multiple video streams can be delivered into an AG or portlet environment

New Tools : Collaboration in Video Markup

R e

D R

e S

STraining and Awareness in e-Social

Science!

Project ReDReSS: Resource Discovery for Researchers in e-Social Science

“ to accelerate the development and awareness of a new kind of computing and data infrastructure for the Social Sciences, and to support the increasingly national and global collaborations emerging in many areas of Social Science”

– To help illustrate appropriate methodologies and software that admits the full complexity of substantive problems;

– To help articulate the middleware needs of social researchers;

– To help nurture and support a community of social researchers;

– To help to provide critical mass and improve the efficiency of interactions between the interested researchers, thus reducing the number of lost opportunities for social science.

R e

D R

e S

S

R e

D R

e S

SWe will use/ contribute to existing

technologies

• Resource discovery

• Sharing tools

• Personalised workspaces

• Flexibly delivery

R e

D R

e S

SE-Science enabling a Virtual Research

Environment!

“to make the use of e-Science technologies, methodologies and resources easier and more transparent than simply developing bespoke applications on an infrastructure toolkit (such as Globus GT2 or OGSI/ WSRF). ”

We need to:

• Bridge the gap between different types of technology (database management, computational methods, data collection, networks, Condor resources, visualization systems, collaborative working, Access Grid, etc.);

• Build on pilot projects and take input from other disciplines

• Link to core JCSR clusters and resources at other e-Science Centres;

• Provide an environment to enhance the programmability and usability of such a Grid by integrating work from a number of ongoing projects and encourage community input.

R e

D R

e S

SThe Grid “Client Problem”

Grid Core

Consumer clients: PC, TV, video, AG

Workplace: desktop clients

Portable clients: phones, laptop, pda, data collection

Middleware

e.g. Globus

Grid Core

Many clients want to access a few Grid-enabled resources

R e

D R

e S

SSome VRE Functions

• Authentication, Authorisation and Accounting – use Shibboleth and Permis in line with JISC proposals;

• Community development of content - Content Management and Editing tools:– Access to middleware resources and

documentation,– Access to training materials and resources,– Enable shared development of services/

applications,– Access to a consultancy/ support service,

• Application Management Services - user access via pre-defined tools and applications to the UK e-Science Grid;

• Data Management Services – discovery, authorisation, transfer, replication, upload, validation, curation;

• Access to Broadcasts - on the Access Grid network;• Management Functions - for experts to maintain the

system and guide non-experts, e.g. via expert systems and workflow.

R e

D R

e S

S

Middleware/Software Library

Access GRID

Security Authorisation Authentication

Text Mining/ Data services

UK GRID Services

D

JJISC PortalJISC Portal

Portal Management

Semantic GRID Services

VLE Portal VRE

Portal

Awareness Raising Resources

Workshops

Functionality/Content of the VRE

R e

D R

e S

SSanity Check

However a number of areas significant for a production Grid environment have hardly yet been tackled. Issues include:

• Grid information systems, service registration, discovery and definition of facilities;

• Security, in particular role-based authorisation;• Portable parallel job specifications;• Meta-scheduling, resource reservation and ‘on demand’

access;• Dynamic linking and interacting with remote data sources;• Wide-area computational/ exprtimental steering;• Workflow composition and optimisation for complex

procedures;• Distributed user and application management;• Data management and replication services;• Grid programming environments, PSEs and user interfaces;• Auditing, advertising and billing in a Grid-based resource

market;• Semantic and autonomic tools;• Usability issues, ethics, etc…

R e

D R

e S

SHuman Factors

Customised delivery may be key to long-term uptake:

• Use an environment familiar to the researchers, e.g.:– Web portals - training, awareness, search tools

(search engines are popular)– Libraries - e.g. C for programmers– Programming environment – e.g. R for statistical

analysis with well-known packages– Sound, video for virtual collaboration (TV is a

popular medium)

Bottom line:

There is a lot we can/ need to do, butSocial Science is already hard – the scientists need tools

that do not make it harder!

R e

D R

e S

SUK E-Social Science Programme

There is currently a growing body of work and projects in this area:

• Pilot projects - ESRC• ReDRESS: Resource Discovery for Researchers in e-

Social Science – JISC• UK National Grid Service + e-Science Grid - JCSR and

DTI Core Programme• NCeSS: National Centre for e-Social Science - ESRC• CQeSSS: Centre for Quantitative e-Social Science

Support - ESRC (+ future NCeSS nodes)• …