Upload
research-data-alliance
View
39
Download
3
Embed Size (px)
Citation preview
Social Mining & Big Data Analytics
H2020 - www.sobigdata.euSeptember 2015- August 2019
@SoBigData (https://twitter.com/SoBigData)
https://www.facebook.com/SoBigData
The Consortium
Delft 17 – 19 February 2016
Integrating national research
Infrastructures
SoBigData is…
A Multidisciplinary European Infrastructure on Big Data and Social
Data Mining providing an integrated ecosystem for ethic-sensitive
scientific discoveries and advanced applications of social data
mining on the various dimensions of social life, as recorded by “big
data”.
SMARTCATs
The GOAL of the Research Infrastructure
• to integrate key national infrastructures and centres of excellence at European level in social mining and big data analytics
• to enable cutting-edge, multi-disciplinary social mining & responsible data science experiments leveraging the Research Infrastructure assets: big data sets, analytical tools and services, and data scientist skills
• to grant access (both online and on-site) to multidisciplinary scientists, innovators, public bodies, citizen organizations, SMEs, as well as data science students at any level of education.
SMARTCATs
The pillars for reaching the goal
• a distributed data ecosystem for procurement, access and curation of big social data
• a distributed platform of interoperable, social data mining methods and associated “data scientist” skills for mining, analysing, and visualising complex and massive datasets
• a community of multidisciplinary scientists, innovators, public bodies, citizen organizations, SMEs, as well as data science students at any level of education scientific, brought together by extensive networking and innovation actions
Delft 17 – 19 February 2016
What are doing our researcher?
• any responsible data science experiment is composed by: – data acquiring (open data, crowdsourcing,
crowdsensing,) – model building (very complex validation phase), – creation of an exploration scenario (what-if
analysis) (different validation setting), – ….similar to many other data-driven science
process,…but data are produced by humans
Exploratories
Social Mining Research Environments tailored on specific multidisciplinary
domains • Promotes results sharing among scientists and
communities• Promotes the use of RI through Virtual and
Transnational Access
Big Data for Societal Debates
Polarization, controversy and topic trends on societal debates through social mediaLead by Aris Gionis and Dominic Rout
Polarized Political Debates
Monitoring Topics across Time and space
Exploratory: Big Data for City of Citizens
Lead by Roberto Trasarti
Estimating traffic fluxes on road network
A
B
C
HW
Big Data for Well Being and Economic Performance
Deprivation Index (in France) predicted with Mobile Phone tracesLead by Peep Kungas
Well-being and Economic Performance
Systemic Risk and Gender Diversity
BigData & Migration Studies
Sentiment Analysis • Internal and external perception by country– Index ρ - the ratio between pro refugees users and against refugees
users – Red means a higher predominance of positive sentiment, higher ρ– Yellow means a higher predominance of negative sentiment, lower ρ
(a) Overall. (b) Internal perception.
(c) External perception.
- +
- +
- +
Firenze, 14 Nov 2016
SoBigData e-Infrastructure
• An Exploratory is also a Virtual Research Environments :– VRE are web-based, community-oriented, comprehensive, flexible, and
secure working environments – VREs are tailored to satisfy the needs of a designated community.
• services for data and methods discovery and access• collaboration oriented facilities enabling scientists
– DATA: different sharing policy, may be shared or not– METHODS: web services (executed over a variety of data centers), or
downloaded (packages to be executed on DATA side)– WORKFLOW: complex analytical process that may imply executions on
different sites. (currently only description or on-site execution on some special analytical platforms)
Delft 17 – 19 February 2016
e-infra design
• SoBigData.eu portal• Sobigdata.eu Catalogue: a set of
functionalities to search, index and discovery all resources (Data, Models and workflows) (powered by D4Science)
• Virtual research Environments (Exploratories) functionalities to create, update and operation (powered by D4Science):
Firenze, 14 Nov 2016
Firenze, 14 Nov 2016
Firenze, 14 Nov 2016
SoBigData example: Resource Catalogue
Search for datasets and methods
Description
Recent Activities
Action Bar
Recent Products
Statistics
Firenze, 14 Nov 2016
VRE example: SoBigData VRE
Application
Posting messages to other VRE users
VRE Abstract
VRE Managers
News Feed
Top Topics
Recent Files
The ethics of SoBigData
• Gathering large quantities of data may have serious consequences:– consequences range from personal harm, – to issues of autonomy, injustice and inequality.
• Making Big Data accessible is a value for democracy• SoBigData adheres to a value-sensitive design
approach:– design solutions to overcome ethical dilemma’s, in this
case those between the utility of the data gathered vs. the protection of the individuals subject to the research.
Ethics in practice
• SoBigData has an ethical framework which provides a broad overview of all the ethical concerns of big data.
• But, as per the VSD outlook, data protection is not only the concern of the ethicists. In order to make the ideals of SoBigData successful, scientific methods also need to be developed in order embed moral principles in practice.
The ethics of SoBigData
• How do we create an infrastructure in which such methods can be disseminated and improved upon?
• Data Management Plan plays a key role:– Each data has its privacy requirements and fact checks
and responsibility• Anonymization techniques are part of the research• Researchers will be trained in applying the
necessary procedural safeguards
Anonymization
Service ProviderMining and
Analytical Engine
InfoMobility
Socio-economic indicators
Health services
Educating the responsible data scientists
Based on a cooperation between ethicists and computer 1. A Massive Online Open Course (MOOC) which instructs
all prospective researchers about the legal and ethical dangers of big data research and the steps they can take to minimise these;
2. A set of workflows that outline the steps researchers can take when designing their approach;
3. Information pop-ups which redirect researchers to state-of-the-art ethical methods.
New challenges are coming
• One of the OECD ideals is algorithmic transparency and the GDPR, also, says that decision-making algorithms should be explainable.
• But what is enough to constitute an explanation?
• We're working on developing some sort of template that would satisfy most people's conceptions of what an explanation should be
Delft 17 – 19 February 2016
• What function should a terms of use have? Currently, SoBigData is trying to defer legal responsibility to the Final User through the ToS, but this is difficult.
• Example: How do we deal with Twitter's intellectual property rights? May he scraper violate Twitter terms of use? although of course the above also holds. How does SoBigData relate to data collectors terms of service
New challenges are coming
Delft 17 – 19 February 2016
SoBigData metadata structure
• A highly structured and detailed metadata structure has been designed in order to provide information about:– Description of the dataset (to make it Findable)– How the dataset has been produced– Intellectual Property– Privacy issues– Who can access the data and how (terms of use, NDA…)
• Mainly based on the DataCite standard
• CopyrightCopyright is a legal right that grants the creator of an original work exclusive rights for its use and distribution for a limited amount of time. Copyright can exist on individual data as well as over a dataset or database as a whole. The application of copyright to factual data and metadata have no eligibility for copyright protection.
• LicenseA license is a unilateral permission by the right holder from the licensor to the licensee to use certain rights. Licenses distinguish themselves from contracts since the implementation of a license does not require mutual agreement.
• Terms of UseThe terms of use are rules that one must obey in order to use the data or service. The terms of use agreement is mainly used for legal purposes by data providers and databases that store data. A legitimate terms of use agreement is legally binding and may be subject to change.
Managing data: disambiguating terms
Firenze, 14 Nov 2016
There are three broad types of data:• Primary/raw data: data coming directly from the
source;• Derivative data: data processed from primary/raw data;• Metadata: reference data describing either the
primary/raw data or derivative data.
Primary/raw data and derivative data may be licensed under different conditions and by different stakeholders
Managing data: type of data and their licenses
Firenze, 14 Nov 2016
The e-infrastructure offering discover and access may be operated by a different actor and offered through specific terms of use.
In this case three types of licenses may be involved:
1. the one agreed between the system operator and the primary data owner,
2. the one selected for derivative product that may differ from the one associated with primary data, and
3. the one agreed between the system operator and the data consumer.
All these licenses have to be captured by the “terms of use” of the e-infrastructure, i.e., they are part of the rules a consumer must agree to accept when using the system.
Managing data: dealing with complexity
Firenze, 14 Nov 2016
Firenze, 14 Nov 2016
A re-use license (to specify in the ToU of the e-infrastructure) concerns at least attribution, copyleft requirement, and control on commercial exploitation of the dataset. Moreover it is needed to manage and apply some forms of control on access.Virtual Research Environments (VREs) offer flexible and secure web-based, community-centric platforms, so researchers can work together on common challenges. VREs terms of use are automatically composed according to the combined data and services selected at the time of VRE definition. • Raw data are then licensed according to the license expressed by the data
owner/custodian and expressed at time of registration of the data content to the e-infrastructure.
• Derivative data products instead are licensed with a license compatible and legally interoperable with the one associated with the primary data.
• It remains under the responsibility of a single user, as expressed in the VRE terms of use, to confirm the license to associate with any produced derivative data.
Managing data:VRE as an instrument to manage the complexity
Firenze, 14 Nov 2016
Meta data definition: Ethics
Firenze, 14 Nov 2016
Meta data definition: Intellectual Properties
Il laboratorio di ricerca SoBigData.it organizza la prima Tuscan Big Data Challenge, un’occasione gratuita per le aziende per migliorare il proprio business. Grazie a un’analisi avanzata ei dati prodotti dalle aziende o estratti da internet è possibile ricavare informazioni utili su diversi fronti.