Upload
damian-r-mingle-mba
View
159
Download
3
Embed Size (px)
Citation preview
How to
Standardize Your Data:
A ML Recipe
DAMIAN MINGLECHIEF DATA SCIENTIST, WPC Healthcare
@DamianMingle
What’s Standardization Anyway?
• Often referred to as “functions and transformers that change raw feature vectors into a representation that is more suitable for the downstream estimator”
• Shifting the distribution of each attribute to have a mean of “0” and a standard deviation of “1”.
Why Standardization Matters
• It’s a common requirement of models
• Models may behave badly without it
• It’s useful for models that rely on the distribution of attributes such as Gaussian processes.
Power in SciKit Learn
• Preprocessing
• Clustering
• Regression
• Classification
• Dimensionality Reduction
• Model Selection
Power of SciKit Learn
Let’s Look at ML Recipe
Standardization
The Imports
from sklearn.datasets import load_iris
from sklearn import preprocessing
Separate Features from Target
iris = load_iris()
print(iris.data.shape)
X = iris.data
y = iris.target
Standardize the Features
normalized_X = preprocessing.scale(X)
Standardization Recipe
# Normalize the data attributes for the Iris dataset.
from sklearn.datasets import load_iris
from sklearn import preprocessing
# load the iris dataset iris = load_iris() print(iris.data.shape)
# separate the data from the target attributes
X = iris.data
y = iris.target
# normalize the data attributes
normalized_X = preprocessing.scale(X)
How to
Standardize Your Data:
An ML Recipe
DAMIAN MINGLECHIEF DATA SCIENTIST, WPC Healthcare
@DamianMingle
Resources
• Society of Data Scientists
• SciKit Learn
• Also:• Scaling features to a range (MinMaxScaler or MaxAbsScaler)
• Scaling sparse data (StandardScaler)
• Scaling data with outliers (RobustScaler)