51
Develop Your Machine Learning Model FUTURE PREDICTIONS SAMSON LEE | DELON YAU Microsoft Hong Kong Limited TECH_Forum

Develop Your Machine Learning Model - … Your Machine Learning Model ... Google News, and YouTube. However, ... Location Prediction:

  • Upload
    vandat

  • View
    236

  • Download
    0

Embed Size (px)

Citation preview

D e v e l o p Yo u r M a c h i n e L e a r n i n g M o d e lF U T U R E P R E D I C T I O N S

SAMSON LEE | DELON YAU

Microsoft Hong Kong Limited

T E C H _ F o r u m

1. Choose the Right Model for Machine Learning

2. The Data and Infrastructure Preparation

3. Developing Your Machine Learning Solutions

1. Choose the RightMachine Learning Model

SAMSON LEE

Technology Solutions Professional

Developer Experience Group

The Intelligent Solution Pipeline

How Does the Machine Learn?

Machine Learning

"Field of study that gives computers theability to learn without being explicitlyprogrammed“ - 1959, Arthur Samuel

A set of algorithms that learn from data

for the discovery of patterns that can be used

to understand or solve the relevant problems

Requires Huge Computing Power Cloud Computing

Machine Learning Categories

• Algorithms trained with data

comprised of examples of the

answers wanted (Have expected

results)

• Example: a model that identifies

fraudulent credit card use would be

trained from a data set with labeled

data points of known fraudulent and

valid charge

• Algorithms tries to autonomously

identify patterns and rules in given

dataset

• Example: find groups of customer

demographics with similar buying

habits

Supervised Unsupervised

Machine Learning Algorithms

The 5questions

data science

answers

Is this A or B? Classification Algorithms

Is this Weird?Anomaly Detection

Algorithms

How much? How many? Regression Algorithms

How is this organized? Clustering Algorithms

What should I do now?Reinforcement Learning

Algorithms

Machine Learning Algorithms

Is this A or B? Classification Algorithms

Is this Weird?Anomaly Detection

Algorithms

How much? How many? Regression Algorithms

How is this organized? Clustering Algorithms

What should I do now?Reinforcement Learning

Algorithms

Microsoft Azure Machine Learning Studio

The Microsoft

Cognitive Toolkit

Open Source

Customize models through

Python, C++ or BrainScript

Run on both

Windows & Linux OS

(Docker enabled)

Team Data Science Process

Business

Understanding

Data

Acquisition &

Understanding

Deployment

Modeling

Data Source

Pipeline

Data Wrangling

Analytics Environment

Feature Engineering

Model Fitting

Model Evaluation

2. The Data andInfrastructure Preparation

SAMSON LEE

Technology Solutions Professional

Developer Experience Group

Platform Services

Security & Management

Infrastructure Services

Datacenter Infrastructure (34 Regions including Hong Kong)

Web Apps

MobileApps

APIManagement

APIApps

LogicApps

NotificationHubs

Content DeliveryNetwork (CDN)

MediaServices

HDInsight MachineLearning

StreamAnalytics

DataFactory

EventHubs

MobileEngagement

ActiveDirectory

Multi-FactorAuthentication

Automation

Portal

Key Vault

BiztalkServices

HybridConnections

ServiceBus

StorageQueues

Store /Marketplace

HybridOperations

Backup

StorSimple

SiteRecovery

Import/Export

SQLDatabase

DocumentDB

RedisCache Search

Tables

SQL DataWarehouse

Azure AD Connect Health

AD PrivilegedIdentity Management

OperationalInsights

CloudServices

Batch Remote App

ServiceFabric Visual Studio

ApplicationInsights

Azure SDK

Team Project

VM Image Gallery& VM Depot

AZURE at a Glance

Platform Services

Security & Management

Infrastructure Services

Datacenter Infrastructure (34 Regions including Hong Kong)

Web Apps

MobileApps

APIManagement

APIApps

LogicApps

NotificationHubs

Content DeliveryNetwork (CDN)

MediaServices

HDInsight MachineLearning

StreamAnalytics

DataFactory

EventHubs

MobileEngagement

ActiveDirectory

Multi-FactorAuthentication

Automation

Portal

Key Vault

BiztalkServices

HybridConnections

ServiceBus

StorageQueues

Store /Marketplace

HybridOperations

Backup

StorSimple

SiteRecovery

Import/Export

SQLDatabase

DocumentDB

RedisCache Search

Tables

SQL DataWarehouse

Azure AD Connect Health

AD PrivilegedIdentity Management

OperationalInsights

CloudServices

Batch Remote App

ServiceFabric Visual Studio

ApplicationInsights

Azure SDK

Team Project

VM Image Gallery& VM Depot

AZURE at a Glance

Ingestion

Modern Data Lifecycle

Processing Staging Serving

• Event Hubs

• IoT Hubs

• Service Bus

• Kafka

• HDInsight

• Azure Data Lake

Analytics

• Storm

• Spark

• Stream Analytics

• Azure Data Lake

Storage

• Azure Storage

• Azure SQL DB

• Azure Data Lake

Storage

• Azure Data

Warehouse

• Azure SQL DB

• Hbase

• Cassandra

• Azure Storage

• Power BI

Enrichment and Curation

Azure Data Factory Azure Machine Learning

A Typical Setup for Business Scenario

Get Yourself Ready on the Journey to theIntelligent Cloud

azure.microsoft.com/

3. DevelopingYour Machine Learning Solutions

DELON YAU

Technical Evangelist

Developer Experience Group

RecommednationEngine?

Introduction

Introduction

Microsoft’s Spot Market is specifically designed to help small businesses gain exposure in the online market, and for consumers to find personalised recommendations. Machine learning plays a vital role in the solution.

Nowadays, product recommendation systems often use techniques such as collaborative filtering or content based filtering.

Why do we need a Recommendation Engine?

E-commerce has reshaped consumer-business interactions. Consumers are exposed to a wide range of choices, and a number of businesses have developed customised recommendation systems.

Spot Markets: High Street (SMHS) is a cloud-based platform designed to connect consumers with local retailers.

What is novel about it? How can we stand out?

Most recommendation systems nowadays have been developed to find the most closely personalised recommendations for their consumers based on consumer feedback such as ratings and comments. For example, systems used by Netflix, Amazon, Google News, and YouTube.

However, the recommendation engine in Spot Market relies not only on various direct and indirect consumer feedback but also on building a consumer profile through retrieving and analysing third party application data. This consumer profile serves as one of the inputs for the recommendation engine which then generates a list of recommended items.

Goals

Understand customers’ interest change and decay information as time goes by. In other words, products and services that the customers interacted with more recently are weighted higher than that the customers interacted with long time ago.

Allow customers to be notified about the most relevant products based on their social network activities such as Facebook, Twitter, Pinterest.

Learning the activities of the user within the app. For example, if they bookmark a certain product or retailer, they might want to see more related products in the future.

Location tracking: Being able to recommend relevant shops or restaurants based on the user’s location.

The three families

Three families of recommender systems are considered here:

Content-based.

Collaborative filtering.

Hybrid.

Our Machine Learning Architecture

Collaborative Filtering

Memory-Based Collaborative Filtering

Item-Item Collaborative Filtering: “Users who liked this item also liked ...”

User-Item Collaborative Filtering: “Users who are similar to you also liked ...”

Model-based Collaborative Filtering

Matrix Factorisation (MF), an unsupervised learning approach for latent variable decomposition and dimensionality reduction.

K-Means K-Means clustering is a popular unsupervised classification algorithm in data mining where the given dataset is not labelled i.e. not categorised.

Benefits of K-means

K-means clustering has several advantages compared to its competitors:

Suitable for product recommendation system: Particularly useful if there is only a limited understanding of how the data is structured. For example, if the given consumer-product matrix contain many different trends.

Easy operation: K-means clustering only takes in the data parameters and the K values, i.e. the number of clusters to be created.

Fast operation

What else?

Rating Deriver

Other Cool Features

Location Prediction: A typical situation that utilises the Location Analyser is of a jogger that runs on the same path every Sunday and has liked a specific brand of athletic shoes on their Facebook account. If a shop that resides along the path of the jogger puts out an offer on trainers, then the Recommendation. Engine will suggest that offer to the jogger before they go out for their weekly exercise.

Who the user is with: Another interesting suggestion from the client is that the future system should be able to recognize not only where the user is and what retailers nearby, but also who the user is with. For example, if the user is with his wife, a restaurant nearby with the cuisine that they both like would be recommended.

Calendar Events

Core Software and Tools

Azure Machine Learning Studio

Python

NumPy

The SciPy Library

Matplotlib

Pandas

SymPy

Ipython - A kernel for Jupyter

Scikit-learn Library

One of the most important libraries in this project. It consists of open-source machine learning and data mining algorithms. Some popular algorithms include regression, classification, kmeans/ spectral data clustering, support vector machines etc. They are well-designed with full compatibility with the aforementioned scientific libraries such as SciPy and NumPy.

A rapidly used module throughout this project, the sklearn.decomposition module, includes popular matrix decomposition algorithms, such as Sparse Principal Component Analysis (SPCA), Non-negative Matrix Factorisation (NMF) and Independent Component Analysis (ICA).

Show me your data!