43
LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) SCIENTIFIC DATA COMPUTING MTAT.08.042 Prepared by: Amnir Hadachi Institute of Computer Science, University of Tartu [email protected]

SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1INTRODUCTION

(DATA-DRIVEN MODELING)

SCIENTIFIC DATA COMPUTING MTAT.08.042

Prepared by:

Amnir Hadachi

Institute of Computer Science, University of Tartu

[email protected]

Page 2: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

OUTLINE

▸ Course Syllabus

▸ General overview

▸ Modelling

▸ Types of modelling

▸ data-driven modelling

▸ statistical models

▸ soft computing model

▸ Spatio-temporal complexity

▸ Type of data

▸ Probability distributions

Page 3: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

COURSE SYLLABUS

1.

Page 4: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

COURSE SYLLABUS

▸ Rule and regulation

▸ The attendance Attendance to the Lectures and Labs are not mandatory.

▸ However it is suggested to attend the labs and respect the deadlines for the lab tasks. (one week for doing the tasks)

▸ Lectures’ slides and video records will be available during the week when the lectures scheduled time is.

▸ course website: “https://courses.cs.ut.ee/2016/SDC/spring/Main/HomePage“

Page 5: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

COURSE SYLLABUS

▸ Topics covered:

▸ Statistical methods and their applications

▸ Linear algebra and singular value decomposition

▸ Basic optimization

▸ Image processing and analysis

▸ Compressed sensing

▸ Text processing

▸ Time series analysis and wavelets

Page 6: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

COURSE SYLLABUS

▸ Course instructor:

▸ Lecturer Amnir Hadachi

▸ Office Ulikooli 17, Room 327.

▸ Office hours:

▸ Friday from 10h till 14h.

▸ Tuesday from 9h till 12h.

Page 7: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

GENERAL OVERVIEW

2.

Page 8: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

GENERAL OVERVIEW

▸ General approach

When we use the term of “modeling” most of the time refers to the process of representing the real world object phenomena as a set of mathematical equations.

Modeling os a system is used to two purpose either for estimation or prediction of system behavior and response to the changing factors.

DEFINITION:

Page 9: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

GENERAL OVERVIEW

▸ General approach

PROCESS TO BE MODELED

MODELPREDICTED OUTPUT VARIABLE

OBSERVED OUTPUT VARIABLEINPUT DATA X

MODELLING APPROACH

Page 10: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

MODELLING3.

Page 11: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

MODELING

▸ Types of modelingMODELS

PHYSICAL MATHEMATICAL

ANALYTICAL CONCEPTUAL DATA-DRIVEN

Page 12: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

MODELING

▸ Types of modeling

▸ Example “Electro-Dynamic Vibration Exciter”

physical system

Page 13: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

MODELING

▸ Types of modeling

▸ Example “Electro-Dynamic Vibration Exciter”

physical model

flexure springsspring

Page 14: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

MODELING

▸ Types of modeling

▸ Example “Electro-Dynamic Vibration Exciter”

physical model

flexure springsspring

Observed effects (two electromechanical effects):

Motor effect: Passage of the current via the coil causes it to experience a magnetic force promotional to the current

Generator effect: motion of the coil inside the magnetic field causes a village promotional to the velocity to be induced into the coil.

Page 15: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

MODELING

▸ Types of modeling

▸ Example “Electro-Dynamic Vibration Exciter”

Mathematical model (System equation)

Page 16: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

MODELING

▸ data-driven modeling

▸ role:

▸ enable us to map causal factors and their consequence outcomes by observing the patterns from the experimental data without understanding the complex physical process.

DEFINITION:A model which can simulate a system using experimental data of that system is known as data-driven modeling

Page 17: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

MODELING

▸ Data-driven modeling

▸ Purpose behind using data-drive modelling:

▸ Data clustering and classification

▸ Estimating the outcome

▸ Forecasting or Predicting the outcome

▸ Optimisation

Page 18: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

MODELING

▸ Data-driven modeling

▸ Characteristics:

▸ inexpensive

▸ accurate

▸ precise

▸ flexible (compared to physical models or analytical models)

Page 19: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

MODELING

▸ Data-driven modeling

▸ There is two groups of data-driven models:

▸ Statistical modeling

▸ Soft computing (known as Artificial intelligence)

Page 20: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

MODELING

▸ Statistical models is comprised of:

▸ Deterministic variables

▸ defined by: mathematical model and synthesized data

▸ Random variable

▸ defined by probabilistic model

▸ parametric (e.g. standard deviation, mean)

▸ non-parametric (based on assumptions)

Page 21: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

MODELING

▸ Soft computing model

▸ the principal of soft computing is modeled using neuro-computing, fuzzy logic and genetic algorithm.

▸ capable of tolerance regarding imprecision, partial truth, approximation and uncertainty.EXAMPLE:

source: http://www.newyorker.com/tech/elements/is-your-thermostat-sexist

ILLUSTR

ATION

BY TOM

I UM

Can help to answer this question what is the suitable temperer for the room to make people feel comfortable.

Page 22: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

MODELING

▸ Spatio-temporal complexity

▸ The model complexity can be defined in space and time.

▸ the model is defined by two characteristics:

▸ space

▸ time

▸ Very important for studying natural phenomena or any event with a dynamic change in space and time.

Page 23: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

MODELING

▸ Spatio-temporal complexity

▸ Example: Travel Time EstimationProblemstatementEs.matethetravel.mepersec.onKnowing:TwoGPSposi.onplusspeedandtheheading

AlltheCrossroadsposi.ondetected

Stepstofollow:MomentofpassageConcludethetravel.mees.ma.onpersec.on

A

B

Page 24: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

MODELING

▸ Spatio-temporal complexity

▸ Example: Travel Time Estimation

Thestateequa.on:

Firstobjec.veEs.matethemomentofpassage

Page 25: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

MODELING

▸ Type of data

▸ discret data

▸ continues data

▸ spacial data

▸ temporal data

0255075

100

0

50

100

050

100150200

APRIL MAY JUNE JULY

Page 26: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

MODELING

▸ How to develop data-driven models

▸ General approach DEFINE THE PROBLEMATIC

MODEL FORMULATION

SOLVING THE MODEL

CHECK AND VALIDATION OF THE MODEL SOLUTION

GIVE FEEDBACK ON THE MODEL

DECISION ON THE MODEL

CORRECTIONS AND ADJUSTMENT

OK

NOT OK

✤ Collect data ✤ Build assumptions ✤ Define variables ✤ Establish relationships ✤ Define functions and formulas

Page 27: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

PROBABILITY DISTRIBUTIONS

4.

Page 28: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

PROBABILITY DISTRIBUTION

DEFINITION:Let X be a continue random variable. Then, a probability distribution or probability density function (pdf) of X is a function f(X) such that for any two numbers a and b with a≤b:

source:http://philschatz.com/statistics-book/contents/m46965.html

Page 29: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

PROBABILITY DISTRIBUTION

REMARK:In order that f(X) to be a legitimate probability density function f(X) must satisfy the following conditions:

Page 30: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

PROBABILITY DISTRIBUTION

QUESTION:

if you toss a die, what is the probability that you roll a 3 or less?

1. 1/6

2. 1/3

3. 1/2

4. 5/6

5. 1

Page 31: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

PROBABILITY DISTRIBUTION

QUESTION:

if you toss a die, what is the probability that you roll a 3 or less?

1. 1/6

2. 1/3

3. 1/2

4. 5/6

5. 1

Page 32: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

PROBABILITY DISTRIBUTION

QUESTION:

Two dice are rolled and the sum of the face values is equal to 6. what is the probability that at least one of the dice came up with a 3 ?

Page 33: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

PROBABILITY DISTRIBUTION

QUESTION:

Two dice are rolled and the sum of the face values is equal to 6. what is the probability that at least one of the dice came up with a 3 ?

1. 2/3

2. 1/3

3. 5/6

4. 1/5

5. 1

Page 34: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

PROBABILITY DISTRIBUTION

QUESTION:

Two dice are rolled and the sum of the face values is equal to 6. what is the probability that at least one of the dice came up with a 3 ?

1. 2/3

2. 1/3

3. 5/6

4. 1/5

5. 1

1-5 , 5-1 , 2-4 , 4-2 , 3-3

Page 35: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

PROBABILITY DISTRIBUTION

EXAMPLE:

Let suppose we have a clock that indicates the time. However, the clock has a malfunctioning which makes the clock stops at random at any time during the day.

if we suppose that X is the time at which the clock stops. Can you define the pdf function?

Page 36: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

PROBABILITY DISTRIBUTION

EXAMPLE:Let suppose we have a computer running computation and showing results at the same time. However, the computer has a malfunctioning which makes it stops at random at any time during the day.

if we suppose that X is the time at which the computer stops. Can you define the pdf function?

SOLUTION:

Page 37: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

PROBABILITY DISTRIBUTION

EXAMPLE:Let suppose we have a computer running computation and showing results at the same time. However, the computer has a malfunctioning which makes it stops at random at any time during the day.

if we suppose that X is the time at which the computer stops. The pdf for X is :

In case we want to know the probability that the computer will stop between 9:00am and 9:45am ?

Page 38: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

PROBABILITY DISTRIBUTION

EXAMPLE:Let suppose we have a computer running computation and showing results at the same time. However, the computer has a malfunctioning which makes it stops at random at any time during the day.

if we suppose that X is the time at which the computer stops. The pdf for X is :

In case we want to know the probability that the computer will stop between 9:00am and 9:45am ?

SOLUTION:

Page 39: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

PROBABILITY DISTRIBUTION

DEFINITION:

if A is a continues random variable, X is said to have a uniform distribution on the interval [A,B], if the pdf of X is:

Page 40: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

PROBABILITY DISTRIBUTION

▸ Random variable

▸ Discrete random variables have a countable number of outcomes

▸ Continuous random variables have an infinite continuum of possible values.

Page 41: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

PROBABILITY DISTRIBUTION

▸ Discreet variable

▸ The probability distribution for a discrete rv. X consists of:

▸ Where,

▸ X is the variable

▸ are the values

▸ Each and

Possible values:

Corresponding probabilities:

with the interpretation that:

012,5

2537,5

50

Page 42: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

PROBABILITY DISTRIBUTION

▸ Discreet variable

▸ Summary statistics for rv.:

▸ Mean value or expected value

▸ Variance

▸ standard deviation

Page 43: SCIENTIFIC DATA COMPUTING MTAT.08.042 LECTURE 1 INTRODUCTION (DATA-DRIVEN MODELING) · 2016. 4. 5. · MODELING data-driven modeling role: enable us to map causal factors and their

LECTURE 1: INTRODUCTION

PROBABILITY DISTRIBUTION

▸ Continues variable

▸ A continues rv. can take any value in some interval.

▸ Thus, the probability that X takes any exact or single value is equal to zero.

▸ Probability of continue rv. X is computed in a range of values or interval: