10
Introduction to Machine Learning NPFL 054 http://ufal.mff.cuni.cz/course/npfl054 Barbora Hladká Martin Holub {Hladka | Holub}@ufal.mff.cuni.cz Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics NPFL054, 2021 Hladká & Holub Lab 4, page 1/10

Introduction to Machine Learning NPFL054 - ` `%%%`#`&12 ...hladka/2021/docs/ml-lab.2021...2021/03/26  · More detailed info is available at the course website. NPFL054,2021 Hladká&Holub

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Machine Learning NPFL054 - ` `%%%`#`&12 ...hladka/2021/docs/ml-lab.2021...2021/03/26  · More detailed info is available at the course website. NPFL054,2021 Hladká&Holub

Introduction to Machine LearningNPFL 054

http://ufal.mff.cuni.cz/course/npfl054

Barbora Hladká Martin Holub

{Hladka | Holub}@ufal.mff.cuni.cz

Charles University,Faculty of Mathematics and Physics,

Institute of Formal and Applied Linguistics

NPFL054, 2021 Hladká & Holub Lab 4, page 1/10

Page 2: Introduction to Machine Learning NPFL054 - ` `%%%`#`&12 ...hladka/2021/docs/ml-lab.2021...2021/03/26  · More detailed info is available at the course website. NPFL054,2021 Hladká&Holub

Programming questions

• (Hierarchichal) clustering• Feature scaling• NLI data set (75 documents, 5 languages)

• Gradient descent algorithm• Find a minimum of a function using Gradient Descent Algorithm (simple

illustration)

• Auto data set• Compute Pearson’s correlation coeffcients for mpg, displacement, weight,

horsepower, acceleration in the Auto data set• Draw boxplots to visualize comparison mpg by origin, mpg by model year,

and weight by origin

• Linear regression• Auto data set, target attribute: mpg

NPFL054, 2021 Hladká & Holub Lab 4, page 2/10

Page 3: Introduction to Machine Learning NPFL054 - ` `%%%`#`&12 ...hladka/2021/docs/ml-lab.2021...2021/03/26  · More detailed info is available at the course website. NPFL054,2021 Hladká&Holub

Feature scaling

Different ranges and units of features• Is the engine displacement more significant than mpg/cylinders/acceleration?

NPFL054, 2021 Hladká & Holub Lab 4, page 3/10

Page 4: Introduction to Machine Learning NPFL054 - ` `%%%`#`&12 ...hladka/2021/docs/ml-lab.2021...2021/03/26  · More detailed info is available at the course website. NPFL054,2021 Hladká&Holub

Feature scaling

Scaling• normalization z = x−xmin

xmax−xminz ∈< 0, 1 >• standardization z = x−x

sdxz = 0, sdz = 1

Useful especially for• Gradient Descent Based Algorithms• Distance based algorithms

NPFL054, 2021 Hladká & Holub Lab 4, page 4/10

Page 5: Introduction to Machine Learning NPFL054 - ` `%%%`#`&12 ...hladka/2021/docs/ml-lab.2021...2021/03/26  · More detailed info is available at the course website. NPFL054,2021 Hladká&Holub

Native language identification task (NLI)

ARA CHI FRE GER HIN ITA

JPN KOR SPA TEL TUR

to be specialized in onespecific subject is also agood thing it will make you aproffional in a specificsubject , but when you itcomes to socity they will

Proficiency: highPrompt: P2

Changing your habbits is notan easy thing , but one isurged to do it for a numberof reasons.A successful teacher shouldrenew his lectures subjects .

Proficiency: highPrompt: P2

This indicates that he is upto date and also refreasheshis mind constantly .Also , this gives him a goodreputation among othercolleagues .

Proficiency: highPrompt: P1

university , I keep gettingnewer information fromdifferent books andresources and include it inmy lectures .

Proficiency: lowPrompt: P8

It is troublesome , as itseems , but it keeps mefreash in information andwith a good status amongmy colleagues .

Proficiency: highPrompt: P7

you are constantly trying tochange your ways ofmaking , or synthesizing ,chemical compunds , youmight come out with lessexpensive methods .

Proficiency: mediumPrompt: P2

The newer method isconsidered an invention in itsown , and it also savesmoney .

Proficiency: highPrompt: P2

to be specialized in onespecific subject is also agood thing it will make you aproffional in a specificsubject , but when you itcomes to socity they will

Proficiency: lowPrompt: P2 It is troublesome , as it

seems , but it keeps mefreash in information andwith a good status amongmy colleagues .

Proficiency: highPrompt: P2

to be specialized in onespecific subject is also agood thing it will make you aproffional in a specificsubject , but when you itcomes to socity they will

Proficiency: highPrompt: P2

Also , you might think ofchanging your view , clothesor even your hair cut .

Proficiency: mediumPrompt: P2

When I change my glassescolor , for example , thiswould be attractive to mystudents and colleagues andwill make me feel better .

Proficiency: lowPrompt: P7

Alternating work anddormancy in your life pacewith activities and exercise isof great benefit to the bodyand the mind .

Proficiency: mediumPrompt: P2

Lastly , change is a differentdecision in the human 's life ,but it is important for manygood reasons , and gets backon the human with goodbenefits .

Proficiency: highPrompt: P2

In contrast , many peoplebelieve that changing isreally important in people 'slives .

Proficiency: highPrompt: P2to be specialized in one

specific subject is also agood thing it will make you aproffional in a specificsubject , but when you itcomes to socity they will

Proficiency: mediumPrompt: P8

NPFL054, 2021 Hladká & Holub Lab 4, page 5/10

Page 6: Introduction to Machine Learning NPFL054 - ` `%%%`#`&12 ...hladka/2021/docs/ml-lab.2021...2021/03/26  · More detailed info is available at the course website. NPFL054,2021 Hladká&Holub

NLI

Identifying the native language (L1) of a writer based on a sample of their writingin a second language (L2)

Our data

• L1s: Arabic (ARA), Chinese (ZHO), French(FRA), German (DEU) Hindi(HIN), Italian (ITA), Japanese (JPN), Korean (KOR), Spanish (SPA), Telugu(TEL), Turkish (TUR)

• L2: English• Real-world objects: For each L1, 1,000 texts in L2 from The ETS Corpus ofNon-Native Written English (former TOEFL11), i.e. Train ∪ DevTest

• Target class: L1

More detailed info is available at the course website.

NPFL054, 2021 Hladká & Holub Lab 4, page 6/10

Page 7: Introduction to Machine Learning NPFL054 - ` `%%%`#`&12 ...hladka/2021/docs/ml-lab.2021...2021/03/26  · More detailed info is available at the course website. NPFL054,2021 Hladká&Holub

NLI

TopicMost advertisements make products seem much better than theyreally are

Sample textnow a days the publisity is the best way to promoved a produt and ifyou wanth to sale a product you should bring some information thatmakes , that the people who is seeing the advertisements make surethat the product very good and in the future this person could buyit .

L1 = Spanish

NPFL054, 2021 Hladká & Holub Lab 4, page 7/10

Page 8: Introduction to Machine Learning NPFL054 - ` `%%%`#`&12 ...hladka/2021/docs/ml-lab.2021...2021/03/26  · More detailed info is available at the course website. NPFL054,2021 Hladká&Holub

Linear regressionRandom error term

• numerical target attribute Y• y = XΘ> + ε

• random error term ε having mean zero, very often unobserved

NPFL054, 2021 Hladká & Holub Lab 4, page 8/10

Page 9: Introduction to Machine Learning NPFL054 - ` `%%%`#`&12 ...hladka/2021/docs/ml-lab.2021...2021/03/26  · More detailed info is available at the course website. NPFL054,2021 Hladká&Holub

Linear regressionRandom error term

• εi = yi −Θ>xi (true target value yi , expected value Θ>xi)• Assumption like: At each value of A1, the output value y is subject to

random error ε that is normally distributed N(0, σ2)

NPFL054, 2021 Hladká & Holub Lab 4, page 9/10

Page 10: Introduction to Machine Learning NPFL054 - ` `%%%`#`&12 ...hladka/2021/docs/ml-lab.2021...2021/03/26  · More detailed info is available at the course website. NPFL054,2021 Hladká&Holub

Linear regressionRandom error term

• εi = yi −Θ>xi (true target value yi , expected value Θ>xi)• residual ei = yi − Θ̂>xi is an estimate of εi

NPFL054, 2021 Hladká & Holub Lab 4, page 10/10