Automatic image moderation in classifieds

Automatic image moderation in classifieds

By Jaroslaw Szymczak

PYDATA PARIS @ PYPARIS 2017June 12, 2017

Agenda

● Image moderation problem● Brief sketch of approach● Machine learning foundations of the solution:

○ Image features○ Listing features (and combination of both)

● Class imbalance problem:○ proper training○ proper testing○ proper evaluation

● Going live with the product:○ consistent development and production environments○ batch model creation○ live application○ performance monitoring

Image moderation problem

Scale of business at OLX

4.4APP RATING

#1 app +22 COUNTRIES (1)

1) Google play store; shopping/lifestyle categoriesNote: excludes Letgo. Associates at proportionate share

→ People spend more than twice as long in OLX apps versus competitors

became one of the top 3 classifieds app in US less than a year after its launch

130 Countries+60 million monthly listings

+18 million monthly sellers

+52 million cars are listed every year in our platforms; 77% of the total amount of cars manufactured!+160,000 properties are listed daily

• 2 houses • 2 cars• 3 fashion items• 2.5 mobile phones

At OLX, are listed every second:

✔ real photo of the phones

✔ selfie with a dress

✔ real shoes photo

✘ human on the picture (OLX India)

✘ stock photo (OLX Poland)

CALL 555-555-555

✘ contact details (all sites)

✘ NSFW (all sites)

Brief sketch of approach

Binary image classification

Image features:

● CNN fine tuning● transfer learning● image represented as 1D vector

Classic features:

● category of listing● is listing from business of a private

person● what is the price?

All fed to

Why not more, e.g. title, description, user history?

Because of pragmatism, we don’t want to overcomplicate the model:● CNN are state of the art for image recognition● classical features help in improving accuracy, but having too many of them would

decrease significance of image features

Image features

Classic image features

And many others, more or less sophisticated methods of feature extraction...

Convolutional Neural Networks

Source: lecture notes to Stanford Course CS231n: http://cs231n.stanford.edu/slides/2017

Fine tuning and transfer learning

Source: lecture notes to Stanford Course CS231n: http://cs231n.stanford.edu/slides/2017

Inception network

Source: http://redcatlabs.com/2016-07-30_FifthElephant-DeepLearning-Workshop/

Inception 21kTrained on 21 841 classes on ImageNet setTop-1 accuracy above 37%Available for mxnet: https://github.com/dmlc/mxnet-model-gallery/blob/master/imagenet-21k-inception.md

VGG16 network

Source: https://www.cs.toronto.edu/~frossard/post/vgg16/

● used model from Keras● easy to freeze arbitrary layers (layer.trainable = False )

Listing featuresWith eXtreme Gradient Boosting (XGBoost)

Feature preparation

After encoding the “classic features” they are concatenated with image ones

Adaptive Boosting

Gradient boosting?

● instead of weights update in each round you try to fit the weak learner to residuals of pseudo-residuals

● similarly like in neural networks, shrinkage parameter is used when updating the algorithm to compensate for loss function

eXtreme Gradient Boosting (XGBoost)

Source: https://www.slideshare.net/JaroslawSzymczak1/xgboost-the-algorithm-that-wins-every-competition

Class imbalance problem

Class imbalance - proper training

● possibilities to deal with the problem:○ undersampling majority class○ oversampling minority class:

■ randomly■ by creating artificial examples (SMOTE)

○ reweighting● undersampling suits our needs the most

○ the general population of good images is not very much “hurt” by undersampling

○ having training data size limitations we can train on more unique examples of bad images

○ we undersample in such manner, that we change the ratio from 99:1 to 9:1

Use real-life ratio

Class imbalance - proper testing

Class imbalance - proper evaluation● accuracy is useless measure in such case● sensible measures are:

○ ROC AUC○ PR AUC○ Precision @ fixed Recall○ Recall @ fixed Precision

● ROC AUC:○ can be interpreted as concordance probability (i.e. random positive example has the probability

equal to AUC, that it’s score is higher)○ it is though too abstract to use as a standalone quality metric○ does not depend on classes ratio

● PR AUC○ Depends on data balance○ Is not intuitively interpretable

● Precision @ fixed Recall, Recall @ fixed Precision:○ they heavily depend on data balance○ they are the best to reflect the business requirements○ and to take into account processing capabilities (then actually Precision @k is more accurate)

ROC AUC - inception-21k and vgg16

PR AUC - inception-21k

PR AUC - vgg16

Going live with the product

Consistent development and production environments

● ensure you have the drivers installednvidia-smi

● create docker image

FROM nvidia/cuda:8.0-cudnn5-devel-ubuntu16.04

...

ENV BUILD_OPTS "USE_CUDA=1 USE_CUDA_PATH=/usr/local/cuda USE_CUDNN=1"RUN cd /home && git clone https://github.com/dmlc/mxnet.git mxnet --recursive --branch v0.10.0 --depth 1 \ && cd mxnet && make -j$(nproc) $BUILD_OPTS

...

RUN pip3 install tensorflow==1.1.0RUN pip3 install tensorflow-gpu==1.1.0RUN pip3 install keras==2.0

● use nvidia-docker-compose wrapper

Batch processwith use of Luigi framework

● re-usability of processing● fully automated pipeline● contenerized with Docker

Luigi Task

Luigi Dashboard

Luigi Task Visualizer

Luigi tips

● create your output at the very end of the task● you can dynamically create dependencies by yielding the task

● adding workers parameter to your command parallelizes task that are ready to be run (e.g. python run.py Task … --workers 15)

● for straightforward workflows inheritance comes handy:

class SimpleDependencyTask(luigi.Task): def create_simple_dependency(self, predecessor_task_class, additional_parameters_dict=None): if additional_parameters_dict is None: additional_parameters_dict = {} result_dict = {k: v for k, v in self.__dict__.items() if k in predecessor_task_class.get_param_names()} result_dict.update(additional_parameters_dict) return predecessor_task_class(**result_dict)

ads_from_one_day = yield DownloadAdsFromOneDay(self.site_code, effective_current_date)

Live processwith use of Flask

● hosted in AWS● horizontally scaled● contenerized with Docker

Live service architecture

Performance monitoring

Performance monitoring (with Grafana)

Acknowledgements● Vaibhav Singh● Jaydeep De● Andrzej Prałat

By Jaroslaw Szymczak PYDATA PARIS @ PYPARIS 2017June 12, 2017

Data & Analytics

Automatic image moderation in classifieds