View
12
Download
0
Embed Size (px)
Citation preview
Marko Smiljanić, NIRI Inteligent computing Ltd,CEO
Developing and validating a document classifier:a real-life story
Developing and validating a document classifier:
a real-life storyMarko Smiljanić, CEO
www.niri-ic.com
About us.
NIRI: 10 years in Intelligent Computing Text Mining Knowledge Discovery and Management All about Data Science
NIŠ
About me.
My role
COMPANY
Business Context The Challenge The Solution Effectiveness
Laboratory measurements Impact estimation Reality
Wrap up
The flow
Business context
Business context
Largest clients include Public Employment Services in EU, USA, and
Asia Staffing companies in EU, USA
Vacancies Job seekers
Job Taxonom
y
SkillTaxonom
y
ELISE Platform
Business Context The Challenge The Solution Effectiveness
Laboratory measurements Impact estimation Reality
Wrap up
The flow
Vacancies
Job Taxonom
y
Document Classification
Occupation Taxonomies ISCO (International Standard Classification of
Occupations) ESCO O*NET and many more ISCO level 1 (10)
ISCO level 2 (42)ISCO level 3 (124)ISCO level 4 (400)
ESCO level 5 (5000)
“Delivery service worker”
Challenges (for humans) Knowing the
taxonomy Ambiguous taxonomyHybrid positionsVague vacancy
Client’s situationin 2014
VacancyAggregato
rand Classifier
Correct Code? PublishRepair
Code!NO
23%
ОК65%
no help
14%
OK9%
no code
12%
2000-4000 per day (into >2000 taxonomy classes) %?
Business Context The Challenge The Solution Effectiveness
Laboratory measurements Impact estimation Reality
Wrap up
The flow
The Solution:NIRI will build you a better classifier
VacancyAggregato
rand Classifier
NIRI Classifier Publish2000-4000 per day
Really?How accurate will it be?How will it fit our process?
Reduce manual effort Increase volume Improve final accuracy
Really. We will (try to):
But you need to give us training data > 1M vacancies
No class12%
Not verified14%
Verified74%%?
Long tail effect
Architecture of our solution
FeatureExtractor Negotiator
Classifier 1
Classifier 2
Classifier N
…Vacancy [Class,
Confidence]+
Vacancy Classifier
External Services
What to do with confidence?
Vacancy, Code, ConfidenceVacancy, Code, ConfidenceVacancy, Code, ConfidenceVacancy, Code, ConfidenceVacancy, Code, ConfidenceVacancy, Code, ConfidenceVacancy, Code, ConfidenceVacancy, Code, ConfidenceVacancy, Code, ConfidenceVacancy, Code, ConfidenceVacancy, Code, Confidence…
Bulk Accept
To check manualy
Batch Processing
CO
NFID
EN
CE
High accuracy
Low accuracy
Using confidence
Business Context The Challenge The Solution Effectiveness
Laboratory measurements Impact estimation Reality
Wrap up
The flow
Measuring accuracy in the laboratory
No class12%
Not verified14%
Verified74%
No class
Incorrect
Correct
Test20%
Train80% Train
Test
x 5
Vacancy Classifier
Corpus Classifier Classifier 100 Classifier 1000
74% 78% 80% 85%
14%13% 12%
10%12% 9% 8% 5%
One of many Laboratory MeasurementsCorrect Incorrect No class
Measuring accuracy in the laboratory
Does this make any sense?
Yes, but…
Measuring accuracy in the laboratory
No class12%
Not verified14%
Verified74%
Vacancy Classifier
No class 9%
Incorrect13%
Correct78%
OriginalClassifier
This is not relaityBiased train/test setAccuracy of test set unknown Inability to test against 26%
Business Context The Challenge The Solution Effectiveness
Laboratory measurements Impact estimation Reality
Wrap up
The flow
Remember the process?
VacancyAggregato
rand Classifier
Correct Code? PublishRepair
Code!NO
23%
ОК65%
no help
14%
OK9%
no code
12%
This is what it actually looks like.Check Repair
Reduce manual effort Increase volume Improve final accuracy
We will
And we proposed this one.Bulk Accept Check Repair
Best/worst case analysis, some manual validation, careful assumptions:
Bulk Accept
Check Repair
Impact estimation showed that: Step 1 effort reduction 60%
(due to bulk acceptance) Step 2 effort reduction 11%
(due to bulk acceptance and top 5 offers) Significant published volume increase
(almost to 100%) Accuracy slightly larger
(+1%, to around 92%)
Does this make any sense?
Yes, but…
Business Context The Challenge The Solution Effectiveness
Laboratory measurements Impact estimation Reality
Wrap up
The flow
No class12%
Not verified14%
Verified74%%?
How can we measure production accuracy?
We can not,unless…
Golden Test Set
How was it built?Check & Repair4 eye principle
Vacancy Classifier
Published
Original Code&
Top 5 VC codes
Original Code&
Top 5 VC codes
Original Code&
Top 5 VC codes
Every single classification was marked as either Correct, Acceptable, or Wrong
Results
Current NIRI VC Current(HQ source)
NIRI VC (HQ source)
63.05%73.91% 72.06% 74.38%
65.98%77.56% 76.25% 78.69%
Golden Test Set ResultsCorrect Acceptable
Highest Quality Source (Training)
Business Context The Challenge The Solution Effectiveness
Laboratory measurements Impact estimation Reality
Wrap up
The flow
Wrap up Clean semantic data, in real-life, can only be a myth. We are looking into
data cleansing approaches. Measuring usefulness can be hard and expensive, but … … it can/must to be monitored after the system is deployed.
It changes over time. Continuous learning, where possible is a great thing. 1) Implementing state-of-the-art machine learning algorithm is one thing.
2) Making it useful is another. 3) Explaining that to the end-user is the third.
NIRI is a very cool company to work with!
I hope you liked the story, and I thank you for your attention.
Developing and validating a document classifier:
a real-life storyMarko Smiljanić, CEO
www.niri-ic.com