43
AI Tools Lab SS2020 Introduction Todor Ivanov [email protected] http://www.bigdata.uni-frankfurt.de/ 1

Introduction Todor Ivanov [email protected] · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

AI Tools Lab SS2020

Introduction

Todor Ivanov [email protected]

http://www.bigdata.uni-frankfurt.de/

1

Page 2: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

● Dr. Todor Ivanov – Senior Researcher– Big Data Benchmarking– Complex distributed software systems (Hadoop & Spark)– Storage and processing of data-intensive applications

2

About Me

AI Tools Lab SS2020

Page 3: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

What about You? ☺

AI Tools Lab SS2020

3

Page 4: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

Frankfurt Big Data Lab - http://www.bigdata.uni-frankfurt.de

Our lab is currently active in the following research areas:

• Big Data Management Technologies

• Data Analytics / Data Science

• Graph Databases / Linked Open Data (LOD)

4AI Tools Lab SS2020

Page 5: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

This Hands-on Course will look at

● Python & Python Libraries

● Machine Learning & AI Tools:

● Google What-If Tool

● IBM AI Fairness 360

● IBM AI Explainability 360

● Microsoft InterpretML

AI Tools Lab SS2020 5

Page 6: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

Course Organization (1)

● Course start/end: Tuesday, 28.04.2020 to Tuesday, 16.06.2020

● Time: Tuesday 10:00 – 12:00

● Location: remote via Zoom

● Languages: English and German

● Credit Points: Students can receive 8 CPs

● Important: Attendance is mandatory!

● Work in teams of 2 students! The final project will be graded (more info follows).

● Communication: [email protected]

6AI Tools Lab SS2020

Page 7: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

Course Organization (2)

● Python ML courses● Project Topic

● Course page: http://www.bigdata.uni-frankfurt.de/big-data-technologies-ss-2020/

● Communication: [email protected]

AI Tools Lab SS2020 7

Date Topic

28.04.2020 Course Organization & Introduction→ Python for Data Science Course

05.05.2020 → Data Analysis with Python Course

12.05.2020 → Data Visualization with Python Course

19.05.2020 → Submit certificates and start Project

26.05.2020

02.06.2020

09.06.2020

16.06.2020 Project Submission

Page 8: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

Python for Data Science (1)https://cognitiveclass.ai/courses/python-for-data-science

Module 1 - Python Basics ○ Your first program ○ Types ○ Expressions and Variables ○ String Operations

Module 2 - Python Data Structures ○ Lists and Tuples ○ Sets ○ Dictionaries

Module 3 - Python Programming Fundamentals ○ Conditions and Branching ○ Loops ○ Functions ○ Objects and Classes

Module 4 - Working with Data in Python ○ Reading files with open ○ Writing files with open ○ Loading data with Pandas ○ Working with and Saving data with

Pandas

COURSE SYLLABUS

8AI Tools Lab SS2020

Page 9: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

Data Analysis with Python (2)https://cognitiveclass.ai/courses/data-analysis-python

Module 1 - Importing Datasets ○ Learning Objectives ○ Understanding the Domain ○ Understanding the Dataset ○ Python package for data science ○ Importing and Exporting Data in Python ○ Basic Insights from Datasets

Module 2 - Cleaning and Preparing the Data ○ Identify and Handle Missing Values ○ Data Formatting ○ Data Normalization Sets ○ Binning ○ Indicator variables

Module 3 - Summarizing the Data Frame ○ Descriptive Statistics ○ Basic of Grouping ○ ANOVA ○ Correlation ○ More on Correlation

Module 4 - Model Development ○ Simple and Multiple Linear Regression ○ Model Evaluation Using Visualization ○ Polynomial Regression and Pipelines ○ R-squared and MSE for In-Sample

Evaluation ○ Prediction and Decision Making

Module 5 - Model Evaluation ○ Model Evaluation ○ Over-fitting, Under-fitting and Model

Selection ○ Ridge Regression ○ Grid Search ○ Model Refinement

COURSE SYLLABUS

9AI Tools Lab SS2020

Page 10: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

Data Visualization with Python (3)https://cognitiveclass.ai/courses/data-visualization-with-python

Module 1 - Introduction to Visualization Tools ○ Introduction to Data Visualization ○ Introduction to Matplotlib ○ Basic Plotting with Matplotlib ○ Dataset on Immigration to Canada ○ Line Plots

Module 2 - Basic Visualization Tools ○ Area Plots ○ Histograms ○ Bar Charts

Module 3 - Specialized Visualization Tools

○ Pie Charts ○ Box Plots ○ Scatter Plots ○ Bubble Plots

Module 4 - Advanced Visualization Tools ○ Waffle Charts ○ Word Clouds ○ Seaborn and Regression Plots

Module 5 - Creating Maps and Visualizing Geospatial Data

○ Introduction to Folium ○ Maps with Markers ○ Choropleth Maps

COURSE SYLLABUS

10AI Tools Lab SS2020

Page 11: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

Grading Info ● (10%) - Regular participation in the virtual meetings

• every Tuesday 10:00-12:00

● (30%) - Complete successfully the 3 Python courses• Send me your certificates + link to badges until May 19th, 2020

● (60%) - Complete successfully the Project Topic• Implement the project task • Submit 15 slides presenting your project• Submit project solution as Jupyter Notebook in github until June 16th, 2020

11AI Tools Lab SS2020

Page 12: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

What is Big Data?

AI Tools Lab SS202012

Page 13: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

Big Data

Growing amount of data• petabytes/exabytes of user data (text, audio, video, images)

Variety of data sources:• Mobile devices• Social platforms• Sensors (RFID)• Web platforms

Processing speed• How fast is the result available ?

AI Tools Lab SS2020 13

Page 14: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

Big Data Definition (sort of)

• Big Data refers to datasets and flows, large enough that has outpaced our capability to store, process, analyze, and understand.

• Big Data refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage and analyze. (McKinsey Global Institute)

– This definition is Not defined in terms of data size (data sets will increase)– Vary by sectors (ranging from a few dozen terabytes to multiple petabytes)

AI Tools Lab SS2020 14

Page 15: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

AI Tools Lab SS2020 15

Source:https://www.domo.com/learn/data-never-sleeps-6

Page 16: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

Big Data Characteristics

AI Tools Lab SS2020 16

[1] D. Laney, “3D data management: Controlling data volume, velocity and variety,” Appl. Deliv. Strateg. File, vol. 949, 2001.

Page 17: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

Big Data Characteristics

AI Tools Lab SS2020 17

● Volume represents the ever-growing amount of data in petabytes, exabytes, zettabytes and yottabytes, which is generated by applications like Facebook, Twitter, IoT, etc. and challenges the current stage of storage systems.

● Velocity describes how quickly the data is retrieved, stored and processed.

● Variety describes the multitude of data sources like sensors, smart devices and social media, producing data in data formats. That is structured, semi-structured or unstructured data, with unstructured data as most common in Big Data use cases.

● Value defines the business value derived from the extracted data insights. Varies depending on the application domain.

● Veracity defines the data accuracy or how truthful it is. If the data is corrupted, imprecise or uncertain, this has direct impact on the quality of the final results.

● Variability defines the different interpretations that a certain data can have when put in different contexts. It focuses on the meaning of the data instead of its variety in terms of structure or representation.

Page 18: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

Structured Data

Employee

EmpNo Ename DeptNo DeptName100 Bob 10 Marketing200 Bob 20 Purchasing150 Peter 10 Marketing170 Doug 20 Purchasing105 John 10 Marketing

18AI Tools Lab SS2020

Page 19: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

Clickstream Data - is an information trail a user leaves behind while visiting a website. It is typically captured in semi-structured website log (source http://www.jafsoft.com/searchengines/log_sample.html) and( http://hortonworks.com)

fcrawler.looksmart.com - - [26/Apr/2000:00:00:12 -0400] "GET /contacts.html HTTP/1.0" 200 4595 "-" "FAST-WebCrawler/2.1-pre2 ([email protected])"

fcrawler.looksmart.com - - [26/Apr/2000:00:17:19 -0400] "GET /news/news.html HTTP/1.0" 200 16716 "-" "FAST-WebCrawler/2.1-pre2 ([email protected])"

ppp931.on.bellglobal.com - - [26/Apr/2000:00:16:12 -0400] "GET /download/windows/asctab31.zip HTTP/1.0" 200 1540096 "http://www.htmlgoodies.com/downloads/freeware/webdevelopment/15.html" "Mozilla/4.7 [en]C-SYMPA (Win95; U)"

123.123.123.123 - - [26/Apr/2000:00:23:48 -0400] "GET /pics/wpaper.gif HTTP/1.0" 200 6248 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"

123.123.123.123 - - [26/Apr/2000:00:23:47 -0400] "GET /asctortf/ HTTP/1.0" 200 8130 "http://search.netscape.com/Computers/Data_Formats/Document/Text/RTF" "Mozilla/4.05 (Macintosh; I; PPC)"

123.123.123.123 - - [26/Apr/2000:00:23:48 -0400] "GET /pics/5star2000.gif HTTP/1.0" 200 4005 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"

123.123.123.123 - - [26/Apr/2000:00:23:50 -0400] "GET /pics/5star.gif HTTP/1.0" 200 1031 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"

123.123.123.123 - - [26/Apr/2000:00:23:51 -0400] "GET /pics/a2hlogo.jpg HTTP/1.0" 200 4282 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"

123.123.123.123 - - [26/Apr/2000:00:23:51 -0400] "GET /cgi-bin/newcount?jafsof3&width=4&font=digital&noshow HTTP/1.0" 200 36 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"

19AI Tools Lab SS2020

Page 20: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

Unstructured DataSensor data logged to a text file. Imported data into Excel (source Memos From the Cube)

20AI Tools Lab SS2020

Page 21: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

Big Data: Challenges

1. Data (Volume, Variety, Velocity … ) 2. Processing (Batch, Near-real time, Real-time …)

3. Management (Meta-data(Schema), Security … )

4. Data Science (Machine Learning, Deep Learning, AI …)

AI Tools Lab SS2020 21

Page 22: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

Big Data & Hadoop Ecosystem

AI Tools Lab SS2020 22

Page 23: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

Big Data Ecosystem

AI Tools Lab SS2020 23https://mattturck.com/data2019/

Page 24: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

What is Data Science/Data Scientist?

AI Tools Lab SS2020

24

Page 25: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

Big Data vs. Data Science

Big Data = Big Data Systems + Data Science

25AI Tools Lab SS2020

Image source: http://www.kevinschmidt.biz/2015/03/22/data-engineer-vs-data-scientist-vs-business-analyst/

Page 26: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

Data Science - Definitions

Data Science aims to derive knowledge from data, efficiently and intelligently. Data Science encompasses the set of activities, tools, and methods that enable data-driven activities in science, business, medicine, and government.

AI Tools Lab SS2020 26

Page 27: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

What is Artificial Intelligence?

AI Tools Lab SS202027

Page 28: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

What is Artificial Intelligence?

28

Arthur Samuel described it as: "the field of study that gives computers the ability to learn without being explicitly programmed.“

Tom Mitchell provides a more modern definition: "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E."

Regression

Classification

Clustering

Decision Trees

Image Processing

Speech Processing

Natural Language Processing

Recommender Systems

Adversarial Networks

Reinforcement Learning

Source: https://blog.quantinsti.com/machine-learning-basics/

AI Tools Lab SS2020

Page 29: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

Unsupervised Learning

29

Unsupervised learning allows us to approach problems with little or no idea what our results should look like. We can derive structure from data where we don't necessarily know the effect of the variables.

Regression

Classification

Clustering

Decision Trees

Image Processing

Speech Processing

Natural Language Processing

Recommender Systems

Adversarial Networks

Reinforcement Learning

K-Means

Source: https://en.wikipedia.org/wiki/Cluster_analysis

AI Tools Lab SS2020

Page 30: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

Supervised Learning

30

In supervised learning, we are given a data set and already know what our correct output should look like, having the idea that there is a relationship between the input and the output. (We have data which includes the correct answer)

Regression

Classification

Clustering

Decision Trees

Image Processing

Speech Processing

Natural Language Processing

Recommender Systems

Adversarial Networks

Reinforcement Learning

Given data about the size of houses on the real estate market, try to predict their price. Price as a function of size is a continuous output, so this is a regression problem.

Turn this example into a classification problem by instead making our output about whether the house "sells for more or less than the asking price." Here we are classifying the houses based on price into two discrete categories.

AI Tools Lab SS2020

Page 31: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

Supervised Learning

31

In supervised learning, we are given a data set and already know what our correct output should look like, having the idea that there is a relationship between the input and the output. (We have data which includes the correct answer)

Regression

Classification

Clustering

Decision Trees

Image Processing

Speech Processing

Natural Language Processing

Recommender Systems

Adversarial Networks

Reinforcement Learning

Source: Introduction to artificial intelligence using Intel® hardware platform

AI Tools Lab SS2020

Page 32: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

Problems?

32

PredPol is an algorithm designed to predict when and where crimes will take place, with the aim of helping to reduce human bias in policing. But in 2016, the Human Rights Data Analysis Group found that the software could lead police to unfairly target certain neighbourhoods. When researchers applied a simulation of PredPol’s algorithm to drug offences in Oakland, California, it repeatedly sent officers to neighbourhoods with a high proportion of people from racial minorities, regardless of the true crime rate in those areas.Source: Lum, Kristian, and William Isaac. "To predict and serve?." Significance 13.5 (2016): 14-19.

“Volvo admits its self-driving cars are confused by kangaroos[…]The company’s “Large Animal Detection system” can identify and avoid deer, elk and caribou, but early testing in Australia shows it cannot adjust to the kangaroo’s unique method of movement.” Source: https://www.theguardian.com/technology/2017/jul/01/volvo-admits-its-self-driving-cars-are-confused-by-kangaroos

AI Tools Lab SS2020

Page 33: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

Bias / Fairness“Algorithmic bias describes systematic and repeatable errors in a computer system that create unfair outcomes, such as privileging one arbitrary group of users over others. Bias can emerge due to many factors, including but not limited to the design of the algorithm or the unintended or unanticipated use or decisions relating to the way data is coded, collected, selected or used to train the algorithm.”Source: https://en.wikipedia.org/wiki/Algorithmic_bias

Finding suitable definitions of fairness in an algorithmic context is a subject of much debate:

„Equality of opportunity defines an important welfare criterion in political philosophy and policy analysis. Philosophers define equality of opportunity as the requirement that an individual’s well being be independent of his or her irrelevant characteristics. The difference among philosophers is mainly about which characteristics should be considered irrelevant. Policymakers, however, are often called upon to address more specific questions: How should admissions policies be designed so as to provide equal opportunities for college? Or how should tax schemes be designed so as to equalize opportunities for income? These are called local distributive justice problems, because each policymaker is in charge of achieving equality of opportunity to a specific issue.”Source: Catarina Calsamiglia. Decentralizing equality of opportunity and issues concerning theequality of educational opportunity, 2005. Doctoral Dissertation, Yale University.

33AI Tools Lab SS2020

Page 34: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

Course Ethical Implications of AI

Topics:● The Ethics of Artificial Intelligence (AI) ● Ethics, Moral Values, Humankind, Technology, AI Examples● On the ethics of algorithmic decision-making in healthcare● Fairness, Bias and Discrimination in AI ● AI and Trust: Explainability, Transparency● AI Privacy, Responsibility, Accountability, Safety, Human-in the loop● AI and Trust● Introduction to Z-inspection. A framework to assess Ethical AI● Legal relevance of AI Ethics● AI Fairness and AI Explainability software tools● Design of Ethics Tools for AI Developers● Assessing AI use cases. Ethical tensions, Trade offs

→ http://www.bigdata.uni-frankfurt.de/data-challenge-ss-2020/

→ All course slides and videos available on the website!

34

Page 35: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

AI/Machine Learning Glossary

35

Library Framework Topology

Hardware-optimized mathematical and other

primitive functions that are commonly used in machine & deep learning algorithms,

topologies & frameworks

Open-source software environments that facilitate

deep learning model development & deployment through built-in components and the ability to customize

code

Wide variety of algorithms modeled loosely after the

human brain that use neural networks to recognize complex

patterns in data that are otherwise difficult to

reverse engineer

Yolo, Inception-ResNetV2,

SSD-MobileNet, Resnet-50,

Faster-RCNN, Wavenet, …

AI Tools Lab SS2020

Page 36: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

Popular Python Libraries

● Computing:

• Pandas (data structures & tools) - https://pandas.pydata.org/

• Numpy (arrays & matrices) - https://numpy.org/

• SciPy (integrals &optimizations) - https://www.scipy.org/

● Algortihmic libs:

• scikit-learn (ML, regressions, ...) - https://scikit-learn.org/stable/

• statsmodels (statistics, estimations, etc..) - https://www.statsmodels.org/stable/index.html

● Data Visualization:

• matplotlib (plots & graphs) - https://matplotlib.org/

• seaborn (heat maps, time series, ..) - https://seaborn.pydata.org/

36AI Tools Lab SS2020

Page 37: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

ML & AI Tools in this Lab

● Google What-If● IBM AI Fairness 360● IBM AI Explainability 360● MS InterpretML

Python Environments (Cloud-based)● Google Colaboratory Tools - https://colab.research.google.com/notebooks/intro.ipynb

● IBM Notebooks - https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/notebooks-parent.html

37AI Tools Lab SS2020

Page 38: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

What-If Tool

What-if Tool is an interactive visual interface designed to probe your models better.https://pair-code.github.io/what-if-tool/

● Compare multiple models within the same workflow

● Visualize inference results

● Visualize feature attributions

● Arrange datapoints by similarity

● Edit a datapoint and see how your model performs

● Compare counterfactuals to datapoints

● Use feature values as lenses into model performance

● Experiment using confusion matrices and ROC curves

● Test algorithmic fairness constraints

38

Page 39: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

AI Fairness & Explainability 360

AIF 360 is an open source toolkit that can help you examine, report, and mitigate discrimination and bias in machine learning models throughout the AI application lifecycle. Containing over 70 fairness metrics and 10 state-of-the-art bias mitigation algorithms developed by the research community, it is designed to translate algorithmic research from the lab into the actual practice of domains as wide-ranging as finance, human capital management, healthcare, and education. https://aif360.mybluemix.net/

AIX 360 is an open source toolkit that can help you comprehend how machine learning models predict labels by various means throughout the AI application lifecycle. Containing eight state-of-the-art algorithms for interpretable machine learning as well as metrics for explainability, it is designed to translate algorithmic research from the lab into the actual practice of domains as wide-ranging as finance, human capital management, healthcare, and education.http://aix360.mybluemix.net/

39AI Tools Lab SS2020

Page 40: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

InterpretML - https://github.com/interpretml/interpret

● InterpretML is an open-source python package for training interpretable machine learning models and explaining black-box systems.

● Interpretability is essential for:• Model debugging - Why did my model make this mistake?• Detecting bias - Does my model discriminate?• Human-AI cooperation - How can I understand and trust the model's

decisions?• Regulatory compliance - Does my model satisfy legal requirements?• High-risk applications - Healthcare, finance, judicial, ...

40AI Tools Lab SS2020

Page 41: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

Next Meeting

● Project Topics and working teams (2 students)

● Zoom invitation via Email (on Monday)

41AI Tools Lab SS2020

Page 42: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

External Resources

● Multiple Learning Paths in Big Data and Data Science: https://cognitiveclass.ai/learn/all/ ● Free courses ● Obtain a certificate (badge) after completing each course

● Learning Paths:•Applied Data Science with Python•Big Data Analytics•Big Data Fundamentals•Data Science for Business•Data Science Fundamentals•Deep Learning•Many Hadoop and Spark courses: https://cognitiveclass.ai/learn/all/page/2/

AI Tools Lab SS2020 42

Page 43: Introduction Todor Ivanov todor@dbis.cs.uni-frankfurt · 2020. 4. 28. · Module 1 - Importing Datasets Learning Objectives Understanding the Domain Understanding the Dataset Python

Deep Learning & AI courses

● Deep Learning in Coursera

● Introduction to TensorFlow for Artificial Intelligence, Machine Learning, and Deep Learning

● TensorFlow Specialization

● AI For Everyone in Coursera

43AI Tools Lab SS2020