Keywords and examples of machine learning

Embed Size (px)

Citation preview

Machine learning:Keywords + Applications1) Applications of machine learning - wind power forecasting (important e.g. for PengHu island!) - rainfalls estimation

2) Some key words (you must know what they mean): - black box / white box - shrinking horizon - objective function - what you get is what you have - model complexity - cross-validation - generative model - quantile, value-at-risk

What you will seein these slides1) Applications of machine learning - wind power forecasting (important e.g. for PengHu island!) - rainfalls estimation

2) Some key words (you must know what they mean): - black box / white box - shrinking horizon - objective function - what you get is what you have - model complexity - cross-validation - generative model - quantile, value-at-risk

I want to produceelectricity

I want to produceelectricity

I have:

- water for hydroelectricity

- a nuclear power plant

- wind farms

- gas turbines

I want to produceelectricityI must ensure, for each time step:

Production of electricity

=

Demand of electricityDemand(t0), Demand(t1), Demand(t2), Demand(t3) known.

I want to produceelectricityWe get four equations:

Production(t0) = Demand(t0)Production(t1) = Demand(t1)Production(t2) = Demandt(2)Production(t3) = Demand(t3)Other equation:Production = hydro-production + nuclear-production + wind-farm production + gas production

I want to produceelectricityWe get four equations:

H(t0)+W(t0)+N(t0)+G(t0) = Demand(t0)H(t1)+W(t1)+N(t1)+G(t1) = Demand(t1)H(t2)+W(t2)+N(t2)+G(t2) = Demandt(2)H(t3)+W(t3)+N(t3)+G(t3) = Demand(t3)

Stock level for Hydro depends on production x(1) = x(0)-H(0) x(2) = x(1)-H(1) x(3) = x(2)-H(2) x(4) = x(3)-H(3)

Also depends on inflowsWe get four equations:

H(t0)+W(t0)+N(t0)+G(t0) = Demand(t0)H(t1)+W(t1)+N(t1)+G(t1) = Demand(t1)H(t2)+W(t2)+N(t2)+G(t2) = Demandt(2)H(t3)+W(t3)+N(t3)+G(t3) = Demand(t3)

Stock level for Hydro: x(0); constraint: x(i) >= 0 x(1) = x(0)+I(0)-H(0) x(2) = x(1)+I(1)-H(1) x(3) = x(2)+I(2)-H(2) x(4) = x(3)+I(3)-H(3)

8 equations(yes, it increases...)H(t0)+W(t0)+N(t0)+G(t0) = Demand(t0)H(t1)+W(t1)+N(t1)+G(t1) = Demand(t1)H(t2)+W(t2)+N(t2)+G(t2) = Demandt(2)H(t3)+W(t3)+N(t3)+G(t3) = Demand(t3)

X(0) + I(0) H(0) >=0X(0) + I(0) H(0) + I(1) H(1) >=0X(0) + I(0) H(0) + I(1) H(1) + I(2) H(2) >=0X(0) + I(0)H(0) +I(1) H(1) +I(2)H(2) +I(3)-H(3)>=0

8 equations(yes, it increases...)Nuclear has constraints as well:

- N(1) in f(N(0))- N(2) in f(N(1))- N(3) in f(N(2))

(very simplified; in fact there are stocks, refills...)

Ok! Summary ?W(0), W(1), W(2), W(3) wind farms production = can not be chosen and W(1), W(2), W(3) unknown!

To be chosen:G(0), G(1), G(2), G(3) gas turbines production

H(0), H(1), H(2), H(3) hydroelectric production (can be somehow negative)

N(0), N(1), N(2), N(3) nuclear power

Ok! Summary ?To be chosen:G(0), G(1), G(2), G(3) gas turbines productionH(0), H(1), H(2), H(3) hydroelectric production (can be somehow negative)N(0), N(1), N(2), N(3) nuclear power

Constraints: production plans must satisfy constraints.

E.g.: if unlimited gas turbines production, we might decide G(0)=demand(0)-W(0), G(1)=demand(1)-W(1), G(2)=demand(2)-W(2), G(3)=demand(3)-W(3) ==> it is a feasible solution

Ok! Summary ?To be chosen:G(0), G(1), G(2), G(3) gas productionH(0), H(1), H(2), H(3) hydroelectric production (can be somehow negative)N(0), N(1), N(2), N(3) nuclear power

Constraints: production plans must satisfy constraints.

E.g.: if unlimited gas production, we might decide G(0)=demand(0)-W(0), G(1)=demand(1)-W(1), G(2)=demand(2)-W(2), G(3)=demand(3)-W(3) ==> it is a feasible solution ==> it is a bad feasible solution

Ok! Summary ?To be chosen:G(0), G(1), G(2), G(3) gas productionH(0), H(1), H(2), H(3) hydroelectric production (can be somehow negative)N(0), N(1), N(2), N(3) nuclear power

Constraints: production plans must satisfy constraints.

E.g.: if unlimited gas production, we might decide G(0)=demand(0)-W(0), G(1)=demand(1)-W(1), G(2)=demand(2)-W(2), G(3)=demand(3)-W(3) ==> it is a feasible solution ==> it is a bad feasible solution

Objective function: not all solutions are equivalent!

Ok! Summary ?Production cost:

Hcost * (H0+H1+H2+H3) + Ncost * (N0+N1+N2+N3) + Gcost * (G0+G1+G2+G3) + Wcost* (W0+W1+W2+W3)

Nb: Cost does not only mean $. Cost means ecological & environmental costs as well.

Quizz !So we have: x0,x1,x2,x3: states at time t0, t1, t2, t3. x0 is given, x1, x2, x3 depend on our decisions.

Some decisions are chosen at time t0. Some decisions are chosen at time t1. Some decisions are chosen at time t2. Some decisions are chosen at time t3.

The cost depends on all decisions. Is this a supervised learning problem ?Is this a reinforcement learning problem ?Is this a boring problem ?

Ok! Summary ?So we have equations.If we know W(1),W(2),W(3), we can evaluate the production cost.We want to: - solve equations - minimize production cost

Problem: we don't know W(1), W(2), W(3).How to know ?

Ok! Summary ?

We want to know W(1), W(2), W(3).

Steps:

(1) Weather simulation: we predict the windat time steps t1 t2 t3 (as in classicalweather forecast)

(2) From the wind forecast,predict the power (e.g. black box model): Based on data

E.g. mean-square error

Predicting W(1), W(2), W(3):

Boring problem ?

Supervised learning problem ?

Reinforcement learning problem ?

Ok! Summary ?

We want to know W(1), W(2), W(3).

Steps:

(1) Weather simulation: we predict the windat time steps t1 t2 t3 (as in classicalweather forecast)

(2) From the wind forecast,predict the power (e.g. black box model): Based on data

E.g. mean-square error

Predicting W(1), W(2), W(3):

Boring problem ?

Supervised learning problem ?

Reinforcement learning problem ?

What doesblack boxmean ?

Difficulties ?In many cases, you will see in your life as an engineer that:

- collecting datas and models is a big part of the work

- solving the problem exactly is impossible

- what really matters in an application is tofind where the current codes arenot satisfactory, and not to spend time onother aspects

Typical questions forthis application

Many constraints / effectsare missing !

(for the real application,we must have far moreconstraints...)

Typical questions forthis applicationMean squareerror in thesupervisedlearning forW1,W2,W3 ?

But ...........................................

Many constraints / effectsare missing !

(for the real application,we must have far moreconstraints...)

Typical questions forthis applicationMean squareerror in thesupervisedlearning forW1,W2,W3 ?

But ...........................................

Many constraints / effectsare missing !

(for the real application,we must have far moreconstraints...)How many timesteps in the future shouldwe consider ?

Typical questions forthis applicationMean squareerror in thesupervisedlearning forW1,W2,W3 ?

But ...........................................

Many constraints / effectsare missing !

(for the real application,we must have far moreconstraints...)How many timesteps in the future shouldwe consider ?We should penalizecases with W4 small !

Typical questions forthis applicationMean squareerror in thesupervisedlearning forW1,W2,W3 ?

But ...........................................

Many constraints / effectsare missing !

(for the real application,we must have far moreconstraints...)How many timesteps in the future shouldwe consider ?In case of long term:should we consider climate change bias ?We should penalizecases with W4 small !

Typical questions forthis applicationMean squareerror in thesupervisedlearning forW1,W2,W3 ?

But ...........................................

Many constraints / effectsare missing !

(for the real application,we must have far moreconstraints...)How many timesteps in the future shouldwe consider ?In case of long term:should we consider climate change bias ?We should penalizecases with W4 small ! Some of these pointsare important, someare negligible,depending on thesystemunder analysis.

Another beautiful application

This is Paris.Beautiful town.With plenty of people(10 millions in IDF).

Another beautiful application

This is Paris.Beautiful town.With plenty of people(10 millions in IDF).Producing plenty of fecalmatter ==> dirty water.

Our river in Parisis the Seine.

A Frenchpolitician saidhe would soonswim across it.

After all, he neverdid it.

For your health,don't do it.

Nevertheless,we try to keep itas cleanas possible.

Dirty water should be separated from the Seine.And usually it is.Something like this:

Dirtywater

Seine

Problem: if big rainfalls reach dirty water, then dirty water might pollute the Seine

Dirtywater

Seine

No typhoon in France.

But we can have heavy rains/winds in Paris: - 0.96 dm in 24 hours happened in 1987.- gusts at 169 km/h in 1999 (very unusual in France)

Problem: if big rainfalls reach dirty water, then dirty water might pollute the Seine

(yes, in Taiwan it is more impressive,sometimes it is 16.7 dm in 24 hours and gustscan reach 250 km/h...)

Dirtywater

Seine

No typhoon in France.

But we can have heavy rains/winds in Paris: - 0.96 dm in 24 hours happened in 1987.- gusts at 169 km/h in 1999 (very unusual in France)

Problem: if big rainfalls reach dirty water, then dirty water might pollute the Seine

(yes, in Taiwan it is more impressive,sometimes it is 16.7 dm in 24 hours and gustscan reach 250 km/h...)

Dirtywater

Seine

No typhoon in France.

But we can have heavy rains/winds in Paris: - 0.96 dm in 24 hours happened in 1987.- gusts at 169 km/h in 1999 (very unusual in France)

Problem: if big rainfalls reach dirty water, then dirty water might pollute the Seine

(yes, in Taiwan it is more impressive,sometimes it is 16.7 dm in 24 hours and gustscan reach 250 km/h...)

Seine

Dirtywater Seine!

Another beautiful applicationThree water networks:

- dirty water: should go to cleaning stations

- clean water: can go to the Seine, but can't be drunk

- drinkable water (France: tap water = drinkable)

Big water network

DirtywaterDirtywaterDirtywaterDirtywaterCleanwaterCleanwaterCleanwaterCleanwater

Water vs dirty waterChallenge: Summer storms. Not comparable to a Taiwanese typhoon. But a lot of water. Can make dirty water become very big. Can invade clean water.

Your mission:- Get read of dirty water- Protect clean water

Water vs dirty waterState: level in each stock,valves' status (open or closed)

At each time step, rainfalls(i) liters of water reach stock i.you can open or close valves==> get a new state.

Your mission:- Get read of dirty water- Protect clean water

Water vs dirty waterTypically:(0, 1, 0, 0, 0, 1, 0, 1, 0.42, 0.2, 0.0, 0.8, 0.3) (valves) (stock levels)

Plenty of rules:- if (valve 4 opens, then water from stock 1 goes to stock 2 at rate 0.02m3/s)- if (stock[2]>0.3) then dirty water ==> Seine, 0.1m3/s

==> Miminize the quantity of dirty water in clean stocks at the end of the storm

Water vs dirty waterEquations:

Stocks(t+1) = complicatedFunction(Stocks(t), rainfalls(t),valves(t))

D-dimensionalvector(D=number of stocks)V-dimensionalvector(V=number of valves)D-dimensionalvectorD-dimensionalvector(D=number of stocks)

Water vs dirty waterTo be decided:valves(t) for each t

If there are 240 times steps,we get 240 x V decisionvariables

Criterion = objective function = quantity of dirtywater reaching the clean network + quantity ofdirty water in the river

V-dimensionalvector(V=number of valves)

Shrinking horizonToo many time steps!

At each time step, make a decisionusing only 30 time steps.

Move this window of 30 time steps.

Shrinking horizonToo many time steps!

At each time step, make a decisionusing only 30 time steps.

Move this window of 30 time steps.

Shrinking horizonToo many time steps!

At each time step, make a decisionusing only 30 time steps.

Move this window of 30 time steps.

Shrinking horizonToo many time steps!

At each time step, make a decisionusing only 30 time steps.

Move this window of 30 time steps.

Shrinking horizon

moving window of30 time steps

Summary ? Is this:- an optimization problem ?- a reinforcement learning problem ?- a supervised machine learning problem ?

Summary ? Is this:- an optimization problem ?- a reinforcement learning problem ?- a supervised machine learning problem ?

Problem: rainfalls are unknown.

How to predict rainfalls ?In fact, there are distinct rainfalls: - R1: a spatial distribution of rainfalls (one number per time step per point of the map) - R2: a underground list of rainfall arrivals (inflows), per stocks (D-dimensional)

Input data: - weather forecast of archive ( R1(t) for each t) - archives of weather forecast R1(t) - archives of inflows R2(t)

If your life was depending on it, whatwould you do ?

If your life was depending on it, whatwould you do ?We are at time t.

We need a forecaster:- which takes available data as input - and outputs R2(t') for t'>=t (why not for t' < t ?)

If your life was depending on it, whatwould you do ?We are at time t.

We need a forecaster:- which takes available data as input - and outputs R2(t') for t'>=t (why not for t' < t ?)

(R2(t+1),R2(t+2),R2(t+3), .... , R2(t+30))

= ?

If your life was depending on it, whatwould you do ?We are at time t.

We need a forecaster:- which takes available data as input - and outputs R2(t') for t'>=t (why not for t' < t ?)

(R2(t+1),R2(t+2),R2(t+3), .... , R2(t+30))

= f( R1(t) ) ?

If your life was depending on it, whatwould you do ?We are at time t.

We need a forecaster:- which takes available data as input - and outputs R2(t') for t'>=t (why not for t' < t ?)

(R2(t+1),R2(t+2),R2(t+3), .... , R2(t+30))

= f( R1(t), R1(t-1), R1(t-2), R1(t-3), R1(t-4), ..., R1(t-50) )

(because there are delays)

If your life was depending on it, whatwould you do ?We are at time t.

We need a forecaster:- which takes available data as input - and outputs R2(t') for t'>=t (why not for t' < t ?)

(R2(t+1),R2(t+2),R2(t+3), .... , R2(t+30))

= f( R1(t), R1(t-1), R1(t-2), R1(t-3), R1(t-4), ..., R1(t-50), R2(t) )

(because what you get is what you have)

If your life was depending on it, whatwould you do ?

= f( R1(t), R1(t-1), R1(t-2), R1(t-3), R1(t-4), ..., R1(t-50), R2(t) )

and then agregation: = f( R1(t), R1(t-1)+R1(t-2), R1(t-3)+R1(t-4)+R1(t-5)+R1(t-6), +..., R2(t) )

Why ?

If your life was depending on it, whatwould you do ?

= f( R1(t), R1(t-1), R1(t-2), R1(t-3), R1(t-4), ..., R1(t-50), R2(t) )

and then agregation: = f( R1(t), R1(t-1)+R1(t-2), R1(t-3)+R1(t-4)+R1(t-5)+R1(t-6), +..., R2(t) )

Because less parameters.

If your life was depending on it, whatwould you do ?

= f( R1(t), R1(t-1), R1(t-2), R1(t-3), R1(t-4), ..., R1(t-50), R2(t) )

and then agregation: = f( R1(t), R1(t-1)+R1(t-2), R1(t-3)+R1(t-4)+R1(t-5)+R1(t-6), +..., R2(t) )

Because less parameters.Rule of thumb: number of parameters less than number of data points / 20