Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
Introduction to Machine LearningNPFL 054
http://ufal.mff.cuni.cz/course/npfl054
Barbora Hladká Martin Holub
{Hladka | Holub}@ufal.mff.cuni.cz
Charles University,Faculty of Mathematics and Physics,
Institute of Formal and Applied Linguistics
NPFL054, 2021 Hladká & Holub Lab 4, page 1/10
Programming questions
• (Hierarchichal) clustering• Feature scaling• NLI data set (75 documents, 5 languages)
• Gradient descent algorithm• Find a minimum of a function using Gradient Descent Algorithm (simple
illustration)
• Auto data set• Compute Pearson’s correlation coeffcients for mpg, displacement, weight,
horsepower, acceleration in the Auto data set• Draw boxplots to visualize comparison mpg by origin, mpg by model year,
and weight by origin
• Linear regression• Auto data set, target attribute: mpg
NPFL054, 2021 Hladká & Holub Lab 4, page 2/10
Feature scaling
Different ranges and units of features• Is the engine displacement more significant than mpg/cylinders/acceleration?
NPFL054, 2021 Hladká & Holub Lab 4, page 3/10
Feature scaling
Scaling• normalization z = x−xmin
xmax−xminz ∈< 0, 1 >• standardization z = x−x
sdxz = 0, sdz = 1
Useful especially for• Gradient Descent Based Algorithms• Distance based algorithms
NPFL054, 2021 Hladká & Holub Lab 4, page 4/10
Native language identification task (NLI)
ARA CHI FRE GER HIN ITA
JPN KOR SPA TEL TUR
to be specialized in onespecific subject is also agood thing it will make you aproffional in a specificsubject , but when you itcomes to socity they will
Proficiency: highPrompt: P2
Changing your habbits is notan easy thing , but one isurged to do it for a numberof reasons.A successful teacher shouldrenew his lectures subjects .
Proficiency: highPrompt: P2
This indicates that he is upto date and also refreasheshis mind constantly .Also , this gives him a goodreputation among othercolleagues .
Proficiency: highPrompt: P1
university , I keep gettingnewer information fromdifferent books andresources and include it inmy lectures .
Proficiency: lowPrompt: P8
It is troublesome , as itseems , but it keeps mefreash in information andwith a good status amongmy colleagues .
Proficiency: highPrompt: P7
you are constantly trying tochange your ways ofmaking , or synthesizing ,chemical compunds , youmight come out with lessexpensive methods .
Proficiency: mediumPrompt: P2
The newer method isconsidered an invention in itsown , and it also savesmoney .
Proficiency: highPrompt: P2
to be specialized in onespecific subject is also agood thing it will make you aproffional in a specificsubject , but when you itcomes to socity they will
Proficiency: lowPrompt: P2 It is troublesome , as it
seems , but it keeps mefreash in information andwith a good status amongmy colleagues .
Proficiency: highPrompt: P2
to be specialized in onespecific subject is also agood thing it will make you aproffional in a specificsubject , but when you itcomes to socity they will
Proficiency: highPrompt: P2
Also , you might think ofchanging your view , clothesor even your hair cut .
Proficiency: mediumPrompt: P2
When I change my glassescolor , for example , thiswould be attractive to mystudents and colleagues andwill make me feel better .
Proficiency: lowPrompt: P7
Alternating work anddormancy in your life pacewith activities and exercise isof great benefit to the bodyand the mind .
Proficiency: mediumPrompt: P2
Lastly , change is a differentdecision in the human 's life ,but it is important for manygood reasons , and gets backon the human with goodbenefits .
Proficiency: highPrompt: P2
In contrast , many peoplebelieve that changing isreally important in people 'slives .
Proficiency: highPrompt: P2to be specialized in one
specific subject is also agood thing it will make you aproffional in a specificsubject , but when you itcomes to socity they will
Proficiency: mediumPrompt: P8
NPFL054, 2021 Hladká & Holub Lab 4, page 5/10
NLI
Identifying the native language (L1) of a writer based on a sample of their writingin a second language (L2)
Our data
• L1s: Arabic (ARA), Chinese (ZHO), French(FRA), German (DEU) Hindi(HIN), Italian (ITA), Japanese (JPN), Korean (KOR), Spanish (SPA), Telugu(TEL), Turkish (TUR)
• L2: English• Real-world objects: For each L1, 1,000 texts in L2 from The ETS Corpus ofNon-Native Written English (former TOEFL11), i.e. Train ∪ DevTest
• Target class: L1
More detailed info is available at the course website.
NPFL054, 2021 Hladká & Holub Lab 4, page 6/10
NLI
TopicMost advertisements make products seem much better than theyreally are
Sample textnow a days the publisity is the best way to promoved a produt and ifyou wanth to sale a product you should bring some information thatmakes , that the people who is seeing the advertisements make surethat the product very good and in the future this person could buyit .
L1 = Spanish
NPFL054, 2021 Hladká & Holub Lab 4, page 7/10
Linear regressionRandom error term
• numerical target attribute Y• y = XΘ> + ε
• random error term ε having mean zero, very often unobserved
NPFL054, 2021 Hladká & Holub Lab 4, page 8/10
Linear regressionRandom error term
• εi = yi −Θ>xi (true target value yi , expected value Θ>xi)• Assumption like: At each value of A1, the output value y is subject to
random error ε that is normally distributed N(0, σ2)
NPFL054, 2021 Hladká & Holub Lab 4, page 9/10
Linear regressionRandom error term
• εi = yi −Θ>xi (true target value yi , expected value Θ>xi)• residual ei = yi − Θ̂>xi is an estimate of εi
NPFL054, 2021 Hladká & Holub Lab 4, page 10/10