38
Social Mining & Big Data Analytics H2020 - www.sobigdata.eu September 2015- August 2019 @SoBigData (https://twitter.com/SoBigData ) https://www.facebook.com/SoBigData

SoBigData. European Research Infrastructure for Big Data and Social Mining

Embed Size (px)

Citation preview

Page 1: SoBigData. European Research Infrastructure for Big Data and Social Mining

Social Mining & Big Data Analytics

H2020 - www.sobigdata.euSeptember 2015- August 2019

@SoBigData (https://twitter.com/SoBigData)

https://www.facebook.com/SoBigData

Page 2: SoBigData. European Research Infrastructure for Big Data and Social Mining

The Consortium

Page 3: SoBigData. European Research Infrastructure for Big Data and Social Mining

Delft 17 – 19 February 2016

Integrating national research

Infrastructures

Page 4: SoBigData. European Research Infrastructure for Big Data and Social Mining

SoBigData is…

A Multidisciplinary European Infrastructure on Big Data and Social

Data Mining providing an integrated ecosystem for ethic-sensitive

scientific discoveries and advanced applications of social data

mining on the various dimensions of social life, as recorded by “big

data”.

Page 5: SoBigData. European Research Infrastructure for Big Data and Social Mining

SMARTCATs

The GOAL of the Research Infrastructure

• to integrate key national infrastructures and centres of excellence at European level in social mining and big data analytics

• to enable cutting-edge, multi-disciplinary social mining & responsible data science experiments leveraging the Research Infrastructure assets: big data sets, analytical tools and services, and data scientist skills

• to grant access (both online and on-site) to multidisciplinary scientists, innovators, public bodies, citizen organizations, SMEs, as well as data science students at any level of education.

Page 6: SoBigData. European Research Infrastructure for Big Data and Social Mining

SMARTCATs

The pillars for reaching the goal

• a distributed data ecosystem for procurement, access and curation of big social data

• a distributed platform of interoperable, social data mining methods and associated “data scientist” skills for mining, analysing, and visualising complex and massive datasets

• a community of multidisciplinary scientists, innovators, public bodies, citizen organizations, SMEs, as well as data science students at any level of education scientific, brought together by extensive networking and innovation actions

Page 7: SoBigData. European Research Infrastructure for Big Data and Social Mining

Delft 17 – 19 February 2016

What are doing our researcher?

• any responsible data science experiment is composed by: – data acquiring (open data, crowdsourcing,

crowdsensing,) – model building (very complex validation phase), – creation of an exploration scenario (what-if

analysis) (different validation setting), – ….similar to many other data-driven science

process,…but data are produced by humans

Page 8: SoBigData. European Research Infrastructure for Big Data and Social Mining
Page 9: SoBigData. European Research Infrastructure for Big Data and Social Mining

Exploratories

Social Mining Research Environments tailored on specific multidisciplinary

domains • Promotes results sharing among scientists and

communities• Promotes the use of RI through Virtual and

Transnational Access

Page 10: SoBigData. European Research Infrastructure for Big Data and Social Mining

Big Data for Societal Debates

Polarization, controversy and topic trends on societal debates through social mediaLead by Aris Gionis and Dominic Rout

Page 11: SoBigData. European Research Infrastructure for Big Data and Social Mining

Polarized Political Debates

Monitoring Topics across Time and space

Page 12: SoBigData. European Research Infrastructure for Big Data and Social Mining

Exploratory: Big Data for City of Citizens

Lead by Roberto Trasarti

Page 13: SoBigData. European Research Infrastructure for Big Data and Social Mining

Estimating traffic fluxes on road network

A

B

C

HW

Page 14: SoBigData. European Research Infrastructure for Big Data and Social Mining

Big Data for Well Being and Economic Performance

Deprivation Index (in France) predicted with Mobile Phone tracesLead by Peep Kungas

Page 15: SoBigData. European Research Infrastructure for Big Data and Social Mining

Well-being and Economic Performance

Systemic Risk and Gender Diversity

Page 16: SoBigData. European Research Infrastructure for Big Data and Social Mining

BigData & Migration Studies

Page 17: SoBigData. European Research Infrastructure for Big Data and Social Mining

Sentiment Analysis • Internal and external perception by country– Index ρ - the ratio between pro refugees users and against refugees

users – Red means a higher predominance of positive sentiment, higher ρ– Yellow means a higher predominance of negative sentiment, lower ρ

(a) Overall. (b) Internal perception.

(c) External perception.

- +

- +

- +

Page 18: SoBigData. European Research Infrastructure for Big Data and Social Mining

Firenze, 14 Nov 2016

SoBigData e-Infrastructure

• An Exploratory is also a Virtual Research Environments :– VRE are web-based, community-oriented, comprehensive, flexible, and

secure working environments – VREs are tailored to satisfy the needs of a designated community.

• services for data and methods discovery and access• collaboration oriented facilities enabling scientists

– DATA: different sharing policy, may be shared or not– METHODS: web services (executed over a variety of data centers), or

downloaded (packages to be executed on DATA side)– WORKFLOW: complex analytical process that may imply executions on

different sites. (currently only description or on-site execution on some special analytical platforms)

Page 19: SoBigData. European Research Infrastructure for Big Data and Social Mining

Delft 17 – 19 February 2016

e-infra design

• SoBigData.eu portal• Sobigdata.eu Catalogue: a set of

functionalities to search, index and discovery all resources (Data, Models and workflows) (powered by D4Science)

• Virtual research Environments (Exploratories) functionalities to create, update and operation (powered by D4Science):

Page 20: SoBigData. European Research Infrastructure for Big Data and Social Mining

Firenze, 14 Nov 2016

Page 21: SoBigData. European Research Infrastructure for Big Data and Social Mining

Firenze, 14 Nov 2016

Page 22: SoBigData. European Research Infrastructure for Big Data and Social Mining

Firenze, 14 Nov 2016

SoBigData example: Resource Catalogue

Search for datasets and methods

Description

Recent Activities

Action Bar

Recent Products

Statistics

Page 23: SoBigData. European Research Infrastructure for Big Data and Social Mining

Firenze, 14 Nov 2016

VRE example: SoBigData VRE

Application

Posting messages to other VRE users

VRE Abstract

VRE Managers

News Feed

Top Topics

Recent Files

Page 24: SoBigData. European Research Infrastructure for Big Data and Social Mining

The ethics of SoBigData

• Gathering large quantities of data may have serious consequences:– consequences range from personal harm, – to issues of autonomy, injustice and inequality.

• Making Big Data accessible is a value for democracy• SoBigData adheres to a value-sensitive design

approach:– design solutions to overcome ethical dilemma’s, in this

case those between the utility of the data gathered vs. the protection of the individuals subject to the research.

Page 25: SoBigData. European Research Infrastructure for Big Data and Social Mining

Ethics in practice

• SoBigData has an ethical framework which provides a broad overview of all the ethical concerns of big data.

• But, as per the VSD outlook, data protection is not only the concern of the ethicists. In order to make the ideals of SoBigData successful, scientific methods also need to be developed in order embed moral principles in practice.

Page 26: SoBigData. European Research Infrastructure for Big Data and Social Mining

The ethics of SoBigData

• How do we create an infrastructure in which such methods can be disseminated and improved upon?

• Data Management Plan plays a key role:– Each data has its privacy requirements and fact checks

and responsibility• Anonymization techniques are part of the research• Researchers will be trained in applying the

necessary procedural safeguards

Page 27: SoBigData. European Research Infrastructure for Big Data and Social Mining

Anonymization

Service ProviderMining and

Analytical Engine

InfoMobility

Socio-economic indicators

Health services

Page 28: SoBigData. European Research Infrastructure for Big Data and Social Mining

Educating the responsible data scientists

Based on a cooperation between ethicists and computer 1. A Massive Online Open Course (MOOC) which instructs

all prospective researchers about the legal and ethical dangers of big data research and the steps they can take to minimise these;

2. A set of workflows that outline the steps researchers can take when designing their approach;

3. Information pop-ups which redirect researchers to state-of-the-art ethical methods.

Page 29: SoBigData. European Research Infrastructure for Big Data and Social Mining

New challenges are coming

• One of the OECD ideals is algorithmic transparency and the GDPR, also, says that decision-making algorithms should be explainable.

• But what is enough to constitute an explanation?

• We're working on developing some sort of template that would satisfy most people's conceptions of what an explanation should be

Page 30: SoBigData. European Research Infrastructure for Big Data and Social Mining

Delft 17 – 19 February 2016

• What function should a terms of use have? Currently, SoBigData is trying to defer legal responsibility to the Final User through the ToS, but this is difficult.

• Example: How do we deal with Twitter's intellectual property rights? May he scraper violate Twitter terms of use? although of course the above also holds. How does SoBigData relate to data collectors terms of service

New challenges are coming

Page 31: SoBigData. European Research Infrastructure for Big Data and Social Mining

Delft 17 – 19 February 2016

SoBigData metadata structure

• A highly structured and detailed metadata structure has been designed in order to provide information about:– Description of the dataset (to make it Findable)– How the dataset has been produced– Intellectual Property– Privacy issues– Who can access the data and how (terms of use, NDA…)

• Mainly based on the DataCite standard

Page 32: SoBigData. European Research Infrastructure for Big Data and Social Mining

• CopyrightCopyright is a legal right that grants the creator of an original work exclusive rights for its use and distribution for a limited amount of time. Copyright can exist on individual data as well as over a dataset or database as a whole. The application of copyright to factual data and metadata have no eligibility for copyright protection.

• LicenseA license is a unilateral permission by the right holder from the licensor to the licensee to use certain rights. Licenses distinguish themselves from contracts since the implementation of a license does not require mutual agreement.

• Terms of UseThe terms of use are rules that one must obey in order to use the data or service. The terms of use agreement is mainly used for legal purposes by data providers and databases that store data. A legitimate terms of use agreement is legally binding and may be subject to change.

Managing data: disambiguating terms

Firenze, 14 Nov 2016

Page 33: SoBigData. European Research Infrastructure for Big Data and Social Mining

There are three broad types of data:• Primary/raw data: data coming directly from the

source;• Derivative data: data processed from primary/raw data;• Metadata: reference data describing either the

primary/raw data or derivative data.

Primary/raw data and derivative data may be licensed under different conditions and by different stakeholders

Managing data: type of data and their licenses

Firenze, 14 Nov 2016

Page 34: SoBigData. European Research Infrastructure for Big Data and Social Mining

The e-infrastructure offering discover and access may be operated by a different actor and offered through specific terms of use.

In this case three types of licenses may be involved:

1. the one agreed between the system operator and the primary data owner,

2. the one selected for derivative product that may differ from the one associated with primary data, and

3. the one agreed between the system operator and the data consumer.

All these licenses have to be captured by the “terms of use” of the e-infrastructure, i.e., they are part of the rules a consumer must agree to accept when using the system.

Managing data: dealing with complexity

Firenze, 14 Nov 2016

Page 35: SoBigData. European Research Infrastructure for Big Data and Social Mining

Firenze, 14 Nov 2016

A re-use license (to specify in the ToU of the e-infrastructure) concerns at least attribution, copyleft requirement, and control on commercial exploitation of the dataset. Moreover it is needed to manage and apply some forms of control on access.Virtual Research Environments (VREs) offer flexible and secure web-based, community-centric platforms, so researchers can work together on common challenges. VREs terms of use are automatically composed according to the combined data and services selected at the time of VRE definition. • Raw data are then licensed according to the license expressed by the data

owner/custodian and expressed at time of registration of the data content to the e-infrastructure.

• Derivative data products instead are licensed with a license compatible and legally interoperable with the one associated with the primary data.

• It remains under the responsibility of a single user, as expressed in the VRE terms of use, to confirm the license to associate with any produced derivative data.

Managing data:VRE as an instrument to manage the complexity

Page 36: SoBigData. European Research Infrastructure for Big Data and Social Mining

Firenze, 14 Nov 2016

Meta data definition: Ethics

Page 37: SoBigData. European Research Infrastructure for Big Data and Social Mining

Firenze, 14 Nov 2016

Meta data definition: Intellectual Properties

Page 38: SoBigData. European Research Infrastructure for Big Data and Social Mining

Il laboratorio di ricerca SoBigData.it organizza la prima Tuscan Big Data Challenge, un’occasione gratuita per le aziende per migliorare il proprio business. Grazie a un’analisi avanzata ei dati prodotti dalle aziende o estratti da internet è possibile ricavare informazioni utili su diversi fronti.