Algorithmic Systems Transparency and Accountability in Big Data & Cognitive Era

Algorithmic Systems Transparency and Accountability in the Big Data Era Nozha Boujemaa - Director of Research Advisor to The CEO of Inria in Big Data Member of the Boards of DTL* & BDVA*

Décembre 2013 [email protected] – Data Driven Paris - January 2016

* Data Transparency Lab, Big Data Value Association

Introduction

N. Boujemaa - Data Driven Paris - January 2017

Focus of data analytics is changing – From description of past to decision support

Val

ue a

nd c

ompl

exity

Inform Analyze

Act

Descriptive

Examples

– Plant operation report

–  Fault report

What happened?

Diagnostic

– Alarm management

– Root cause identification

Why did it happen?

Predictive

– Power consumption prediction

–  Fault prediction

What will happen?

Prescriptive

– Operation point optimization

–  Load balancing

What shall we do?

Gartner 2013 - N. Gauss/Siemens - 2015


Big Data Technologies are enablers for AI capabilities From « Data Analytics » to « Cognitive Systems »

- 3


Drivers & Barriers

- 4

Drivers/opportunities: •  Data Science performance and capabilities growth •  New Business Models for Value Creation: CRM, Industry 4.0, ...

=> Prerequisite: Data- Algorithm duality Barriers: •  Trust & appropriation: Data & Algorithmic transparency &

accountability §  Data veracity (malicious attack, noise, etc) § Quality and usage conditions of Algorithms

•  Interdisciplinary skills in Data Science is KEY


DATA SCIENCE PILLARS & CHALLENGES


5 Pillars for Data Science* 1- Data Management: unstructured and semi-structured

§  Semantic interoperability of heterogeneous sources and representations, Data quality, Data provenance,

2- Data Processing Architecture : §  Scalability, Decentralization (Cloud/Fog etc), Low-energy consumption

3- Data Analytics: §  Semantic Analysis, Content Validation, Predictive/Presciptive Analytics

4- Data Protection: §  Privacy-enhancing models and techniques, Robusteness against

reversibility

5- Data Visualization: §  Interactive visual analytics, Collaborative, Cross-platform data frameworks

* Inspired by BDVA SRIA technical priorities

- 6


Main R&D Challenges 1.  Progressive user-centric analytics:

§  On-line Learning (real-time learning from few examples)

§  Correlation vs Causality

§  Seamless co-evolution between Humans & Machines

2.  Energy efficient & optimized architecture

3.  Scientific and Technical Methods for Transparency, Accountability and Explainability

Applications: Responsive communities, Energy optimization, Food security, Industry 4.0 etc.

- 7


Challenges for Data Science Responsible and Ethical Data Management and Analytics It is often assumed that big data techniques are unbiased:

§  because of the scale of the data

§  because the techniques are implemented through algorithmic systems.

⇒ it is a mistake to assume they are objective simply because they are data-driven * (“Data fundamentalism”)

Consensus is emerging to develop methods and Tools to build Trust over Transparency & Accountability for Data and Algorithms ⇒ Implementing the “Responsible-by-design” principle (fairness/

equity, loyalty, neutrality etc.)

* White House – OSTP Report « Big Data: A Report on Algorithmic Systems, Opportunity, and Civil Rights », May 2016 * Federal Trade Commission Report: “Big Data: A Tool for Inclusion or Exclusion? January 2016

- 8


Mastering Big Data Technologies: Bias problems could impact data technologies accuracy and people’s lives

Challenges 1: Data Inputs to an Algorithm §  Poorly selected data §  Incomplete, incorrect, or outdated data §  Data sets that lack disproportionately represent certain populations §  Malicious attack

Challenges 2: The Design of Algorithmic Systems and Machine Learning

§  Poorly designed matching systems §  Unintentional perpetuation and promotion of historical biases §  Decision-making systems that assume correlation implies causation

Data Science Challenges: Responsible and Ethical Data Management and Analytics

- 9


§  Trust and Transparency of computer-aided decision-making process (decision responsibility): what are the different criteria/data/settings that have led to the specific decision in order to understand the global path for the reasoning?

§  “How can I trust Machine Learning prediction?” it happens to build the model of the object context rather the object itself

§  Decision explanation and tractability

§  Robustness to bias/diversion/corruption

§  Careful software reuse

Data Science Challenges: Responsible and Ethical Data Management and Analytics

- 10


Potential Synergies for International collaborations 1.  Opening the black box of Deep Learning,

2.  Trustworthiness of Machine Learning Algorithms (bias typology, software reuse, etc.)

3.  Algorithmic explanability approaches,

4.  Cross views on fairness definitions and related measuring methods

5.  Interdisciplinary Training for Data Scientist (in addition to Maths-Computer Science)

- 11


DARPA initiative: Explainable AI (August 2016) The Explainable AI (XAI) program aims to create a suite of machine learning techniques that:

- 12

•  Produce more explainable models, while maintaining a high level of learning performance (prediction accuracy);

•  Enable human users to understand, appropriately trust, and effectively manage the emerging generation of artificially intelligent partners.

N. Boujemaa - Data Driven Paris - January 2017 - 13

Relevant French Programs TransAlgo (Inria) – Dec 2016, Minister of Digital Economy

§  National Scientific Platform for Transparency & Accountability Tools and Methods for Data and Algorithms (Fairness, Neutrality, Loyalty); b2b & b2c

§  Support of The new “Law for Digital Republic” after CGE report, §  Contributors: CNNum, DGCCRF (French FTC) besides academia and

associations, §  3 Objectives: * Resource center (reports, publications, software, initiatives),

*Research & Dev. programs -with DTL, * Best practices & Moocs

I2-DRIVE Program: Interdisciplinary Institute for Data Research: Intelligence, Value and Ethics (ongoing)

§  4 Overarching Challenges: From Data to Knowledge, from Data to Decision, Deep learning toward Artificial Intelligence, Digital Trust and Appropriation, Data economy and regulation

§  Scientific and disciplinary foundations: Data Science, Management and Economy, Social Sciences, Legal Sciences

§  Roadmap for 10 years, 200 M€ Budget, 12 academic institutions

CONCLUSION

N. Boujemaa - Data Driven Paris - January 2017 - 15

§  Algorithmic systems transparency is essential for digital trust and appropriation

§  Transparency and Responsible-by-Design approaches are economical competitiveness factors

§  Tools for consumers (b2b & b2c) empowerment

§  Tools for regulators (Loi pour la République Numérique)

Conclusion :

Thanks for your attention

[email protected]

- 16

Data & Analytics

Algorithmic Systems Transparency and Accountability in Big Data & Cognitive Era