Assignement 1 Machine Learning

Machine Learning Adnan Alam Khan [email protected] Page 1

Introduction to Handwriting Recognizer or OCR

Assignment 1 Presented to Meritorious. Professor .Dr.Aqil Burni Head of Actuarial Sciences Institute of Business Management

mailto:[email protected]


Machine learning is about designing algorithms that allow a computer to learn. Learning is not necessarily involves consciousness but learning is a matter of finding statistical regularities or other patterns in the data. Thus, many machine learning algorithms will barely resemble how human might approach a learning task. However, learning algorithms can give insight into the relative difficulty of learning in different environments. The performance and computational analysis of machine learning algorithms is a branch of statistics known as computational learning theory. A computer system learns from data, which represent some “past experiences” of an application domain. Types of Machine learning are as follows: •Supervised learning: where the algorithm generates a function that maps inputs to desired outputs. One standard formulation of the supervised learning task is the classification problem: the learner is required to learn (to approximate the behavior of) a function which maps a vector into one of several classes by looking at several input-output examples of the function. •Unsupervised learning: which models a set of inputs: labeled examples are not available. •Semi-supervised learning: Which combines both labeled and unlabeled examples to generate an appropriate function or classifier? •Reinforcement learning: Where the algorithm learns a policy of how to act given an observation of the world. Every action has some impact in the environment, and the environment provides feedback that guides the learning algorithm. •Transduction: similar to supervised learning, but does not explicitly construct a function: instead, tries to predict new outputs based on training inputs, training outputs, and new inputs. •Learning to learn: where the algorithm learns its own inductive bias based on previous experience.



In the area of supervised learning which deals much with classification. These are the algorithms types: • Linear Classifiers

Logical Regression

Naïve Bayes Classifier

Perceptron

Support Vector Machine • Quadratic Classifiers • K-Means Clustering • Boosting

• Neural networks • Bayesian Networks

Y=f(�⃗⃗⃗� . 𝑋 ) = 𝑓(∑ 𝑊𝑗. 𝑋𝑖)𝐽



Our focus: learn a target function that can be used to predict the values of a discrete class attribute, e.g., approve or not-approved, and high-risk or low risk. A credit card company receives thousands of applications for new cards. Each application contains information about an applicant,

age Marital status annual salary outstanding debts credit rating etc.

Problem: to decide whether an application should approved, or to classify applications into two categories, approved and not approved.

Learn a classification model from the data Use the model to classify future loan applications into

Yes (approved) and No (not approved)

What is the class for following case/instance?

,

cases test ofnumber Total

tionsclassificacorrect ofNumber Accuracy



Given

a data set D, a task T, and a performance measure M,

A computer system is said to learn from D to perform the task T if after learning the system’s performance on T improves as measured by M. In other words, the learned model helps the system to perform T better as compared to no learning.

Data: Loan application data Task: Predict whether a loan should be approved or not. Performance measure: accuracy.

No learning: classify all future applications (test data) to the majority class (i.e., Yes): Accuracy = 9/15 = 60%.

We can do better than 60% with learning.

Decision Tree:





Handwriting Recognizer or OCR: In general machine learning develops algorithms for making predictions (means statistical sense) from data. Confuse is it a)statistics or b) data mining a)explain the data b)Task you have to solve. In other words we say machine learning I between a & b. Explanation of data in ML: Data consists of data instances, representation as feature vector. Technical definition of ML is features are chosen for specific task.ML is about generalization. Machine learning is consist of a) Classification b) Clustering c) Regression Classification : Data belongs to certain group or training phase result classification model. Clustering :Which group we have the data, close to each other data set. Regression :ranking of data points

No ML works with 100% precision (means chances for success)





Next to Binary is Trinary data



SVM: Does not require too much training data, its training is expensive and a million instances would be the upper bound. Note it requires parameter tuning. Decision Tree:

Understanding Handwriting Recognition in 6 easy steps:

Step1: developing features Center, right , left and the up in red.

Step2: Feature overlapping on numbers then removal.

Removing numbers the remaining feature is shown above. Step 3: Arranging features are as follows.

Step 4: Now add fillers or blanks

Step 5: Confining features



Step 6: Decision Tree.

One of the most important trends in databases is the increased use of parallel evaluation techniques Another name of Machine Learning is Supervised Learning Supervised learning (machine learning) takes a known set of input data and known responses to the data, and seeks to build a predictor model that generates reasonable predictions for the response to new data. For example, suppose you want to predict if someone will have a heart attack within a year. You have a set of data on previous people, including their ages, weight, height, blood pressure, etc. You know if the previous people had heart attacks within a year of their data measurements. So the problem is combining all the existing data into a model that can predict whether a new person will have a heart attack within a year. Supervised learning splits into two broad categories: Supervised learning splits into two broad categories:

Classification for responses that can have just a few known values, such as 'true' or 'false'. Classification algorithms apply to nominal, not ordinal response values.

Regression for responses that are a real number, such as miles per gallon for a particular car. You can have trouble deciding whether you have a classification problem or a regression problem. In that case, create a regression model first—regression models are often more computationally efficient.

Known Data

Known Responses

Model

Model

Predicted Data

Predicted Response



While there are many Statistics algorithms for supervised learning are present, most use the same basic workflow for obtaining a predictor model:

1. Prepare Data 2. Choose an Algorithm 3. Fit a Model 4. Choose a Validation Method 5. Examine Fit; Update Until Satisfied 6. Use Fitted Model for Predictions

Prepare Data All supervised learning methods start with an input data matrix, usually called X in this documentation. Each row of X represents one observation. Each column of X represents one variable, or predictor. Represent missing entries with NaN values in X. Statistics can supervised learning algorithms can handle NaN values, either by ignoring them or by ignoring any row with a NaN(not a number) value. You can use various data types for response data Y. Each element in Y represents the response to the corresponding row of X. Observations with missing Y data are ignored. For regression, Y must be a numeric vector with the same number of elements as the number of rows of X.

For classification, Y can be any of these data types. The table also contains the method of including missing entries. Choose an Algorithm: There are tradeoffs between several characteristics of algorithms, such as: Speed of training Memory utilization Predictive accuracy on new data Transparency or interpretability, meaning how easily you can understand the reasons an algorithm makes its predictions Characteristics of Algorithms * SVM prediction speed and memory usage are good if there are few support vectors, but can be poor if there are many support vectors. When you use a kernel function, it can be difficult to interpret how SVM classifies data, though the default linear scheme is easy to interpret. ** Naive Bayes speed and memory usage are good for simple distributions, but can be poor for kernel distributions and large data sets. *** Nearest Neighbor usually has good predictions in low dimensions, but can have poor predictions in high dimensions. For linear search, Nearest Neighbor does not perform any fitting. For kd-trees, Nearest Neighbor does perform fitting. Nearest Neighbor can have either continuous or categorical predictors, but not both.



Pairwise Distance Categorizing query points based on their distance to points in a training dataset can be a simple yet effective way of classifying new points. You can use various metrics to determine the distance, described next. Use pdist2 to find the distance between a sets of data and query points.





Handwriting Recognizer or OCR Matlab Code

%This Code is developed by Adnan Alam Khan for Machine Learning %Course Ph.D Computer Science % clear; % Erase all existing variables. Or clearvars if you want.

clc; % Clear the command window. close all; clear all; workspace; % Make sure the workspace panel is showing. format long g; format compact; fontSize = 22; cho=0; possibilityy=3; while cho~=possibilityy, cho=menu('HAND WRITING RECOGNIZOR','UPLOAD HAND WRITTEN IMAGE

','CONVERSION','E X I T');

%||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

if cho==1, clc; %----Selection of image ---- h = waitbar(0,'P l e a s e w a i t . . . '); for i=1:1000, % computation here % waitbar(i/1000) end; close(h) ; clc; [namefileA,pathname]=uigetfile('*.*','Select Image '); if namefileA~=0 [imagen,mapL]=imread(strcat(pathname,namefileA)); %figure('Tag','Plotting Figure'); imshow(imagen); else warndlg('Image must be selected .',' Warning ')

end; end;

if cho==2, imshow(imagen); title('INPUT IMAGE') % Convert to gray scale if size(imagen,3)==3 %RGB image imagen=rgb2gray(imagen); end % Convert to BW threshold = graythresh(imagen); imagen =~im2bw(imagen,threshold); % Remove all object containing fewer than 30 pixels imagen = bwareaopen(imagen,30); %Storage matrix word from image word=[ ]; re=imagen; %Opens text.txt as file for write fid = fopen('text.txt', 'wt');



% Load templates load templates; global templates; % Compute the number of letters in template file num_letras=size(templates,2); while 1 %Fcn 'lines' separate lines in text [fl re]=lines(re); imgn=fl; % Label and count connected components [L Ne] = bwlabel(imgn); for n=1:Ne [r,c] = find(L==n); % Extract letter n1=imgn(min(r):max(r),min(c):max(c)); % Resize letter (same size of template) img_r=imresize(n1,[42 24]); %Uncomment line below to see letters one by one %imshow(img_r);pause(0.5) % Call fcn to convert image to text letter=read_letter(img_r,num_letras); % Letter concatenation word=[word letter]; end; fprintf(fid,'%s\n',word);%Write 'word' in text file (upper) % Clear 'word' variable word=[ ]; %*When the sentences finish, breaks the loop if isempty(re) %See variable 're' in Fcn 'lines' break end; end; fclose(fid); % %Open 'text.txt' file % fprintf(fid,'%s\n',word);%Write 'word' in text file

(upper) % % Clear 'word' variable % word=[ ]; % %*When the sentences finish, breaks the loop % if isempty(re) %See variable 're' in Fcn 'lines' % break % end % fclose(fid); winopen('text.txt'); fprintf('Computational Intelligence Project\nMade by:\n Adnan Alam

Khan [email protected]\n Institute of Business Management

2015\n'); % clear all;

end;

if cho==3, %clc; button = questdlg('Ready to quit?','Exit Dialog','Yes','No','No'); switch button case 'Yes', display(' Characters are: '); %display(NumberOfOnes) disp('Exiting MENU.................'); disp('......................................'); close all;



%break ;

case 'No', quit cancel;

end;

end; end; clear all;

Related Images:










Documents

Assignement 1 Machine Learning