36
Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014

Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014

Embed Size (px)

Citation preview

Page 1: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014

Crawling, Parsing and Semantic Matching of Vacancies and CV’s

Semantic Recruitment Technology

Jakub Zavrel, TextkernelInGRID Workshop 11-2-2014

Page 2: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014

Textkernel: • Spinoff from R&D in machine learning and language

technology

• Founded 2001, offices in Amsterdam (HQ), Frankfurt, Paris, 45 employees; strong R&D focus

• Deloitte Fast 50 2007, 2010, 30% YoY growth

• Core technology: Understanding unstructured text data. Multi-lingual

Market:

• Job boards, Recruitment Software, Staffing and recruitment, Mobility, Large Employers

• Products:

• Multi-lingual tools (15 languages) to extract CVs and jobs

• Jobfeed: largest real time DB for job market analysis

• Search! & Match! to connect people and jobs

• Customers: UWV, Pole Emploi, Adecco, Randstad, USG, Monster, Stepstone, XING, SAP, Unisys, Bosch, Axa, Philips, etc. (350 direct, 2000+ indirect),

• Large partner network (HR & recruitment software)

Page 3: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014

I like programming, but I’m interested do take on more project management responsibility

Is there a job in our organisation that better fits my degree?

I’d like to work on our mobile strategy. I’ve helped a friend develop a mobile app.

I’d like to do more with my organisational talent.

We are looking to hire:An experienced tech team team lead

Language gap

The ideal candidate has:- min. 5yr of experience- Certfied scrummaster- Exp. w/iOS, Android

Completed academic studies Computer Science or related

30% travel for customer presentations

Page 4: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014

The Job ad searches directly in a database and identifies relevant candidates (or vice

versa) …

Page 5: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014
Page 6: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014

Automatically convert each document into a complete record

Extract! CV/Job Parsing

Page 7: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014

Extract!

Page 8: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014

Extract!

Page 9: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014

Extract!

Page 10: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014

Extract!

Page 11: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014

Extract! – Zero data entry job application

Page 12: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014

Extract!

Page 13: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014

• Time savings coding CVs and Jobs• If you accept noise, 100% time savings• Structured data allows better search:

Semantic Searching and Matching• Coding enables reporting and statistics

Extract!

Page 14: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014

• Coding follows Extraction• Customer specific or standard taxonomies• String similarity based normalization• Lot of synonyms per language• Distance = confidences • Problem cases: ambiguity, context, long tail• More complex models can help

(classifiers, multi-variate models)• Semantic matching better (occupation coding errors are

counterbalanced by other variables)

Occupation coding!

Page 15: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014

• Semantic search:

„Lets you find what you mean not what you type“

Impression...

Search!

Page 16: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014

Match!

Match!

Page 17: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014

Semantic Matching Technology:

• Natural Language Processing

• Machine Learning

• Semantic Analysis

• Probabilistic Language Model

• Search Engine

• Multi-lingual taxonomies

• Recruitment knowledge-bases

Page 18: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014

Demo

Page 19: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014

Search and analyse real-time online job ads as well as historical

data

Jobfeed

Page 20: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014

Jobfeed

Page 21: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014

Jobfeed!

Knowledge of all demand for labour in European job market

– Sales leads for recruitment and staffing companies– Real time labour market analytics tools– Largest database of jobs for matching unemployed– Perfect data source for text mining

Page 22: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014

Jobfeed!• Real time collection of online job ads from any

(unstructured) source

• Available in NL, DE, FR, IT• Gradually rolling out in rest of Europe• Richly semantically structured data

Page 23: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014

Jobfeed!

Page 24: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014

Jobfeed: Multilingual Occupation Taxonomy

Occupations >4000 codes4 languages3 layer hierarchy

>50K synonyms

Link to other concepts:- Skills- Education level- Sector- O*NET- UWV (Dutch Employment Agency)- ROME

Based on millions of jobs, years of customer feedback and experience!

Example: NL: administratief medewerker, EN: administrative assistant, FR: employé administratif, DE: Verwaltungsassistent (m/w).

Group: administrative personnelClass: Administration and Customer ServiceSynonyms: administrative employee, assistant clerk, office support

Skills: ms office, excel, english language, etc

O*NET: 43-9199.00: Office and Administrative Support Workers, All OtherUWV: 1000402563: Administratief medewerker secretariaat

Page 25: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014

Demo

Page 26: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014

Jobfeed as material for Research

Page 27: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014
Page 28: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014
Page 29: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014
Page 30: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014
Page 31: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014

Frequent words for "Java developer"

en

van

de

een

je

met

in

het

Java

of

Je

op

is

voor

te

ervaring

aan

als

and

software

omteamzijnkennisbijErvaringdiethenaara

jaarjijbentDeveloperHBOhebttowerken

werk

Page 32: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014

Frequent words for all professions

en

van

de

een

in

het

je

met

op

Je

voor

te

is

of

zijn

aan

bent

naar

bij

om

alservaringdieHethebtdezewerkenzoekDewij

functieonzebentotoverwerkopleidinguitandwerkzaamheden

datbinnenuAlsVoorzelfstandigkennisooksverantwoordelijk

Page 33: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014

Solution: contrast frequencies

• Observed frequency of w: • O(w) = A• Expected frequency of w: • E(w) = C * B / D• Pick words with highest

score:• score(w) = (O - E)2 / E

Java develo

per jobs

Alljobs

# jobs where

w occurs

A B

Total # jobs C D

Page 34: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014

Top words for "Java developer"

java

developer

software

spring

scrum

agile

hibernate

ontwikkelaar

u

j2ee

developmentmavenapplicatieservaringwebdeframeworksjbossmbosenior

wijxmljeeojavascriptyoukennisontwikkelenoracleontwikkeling

architectuurwebservicesinformaticawerkzaamhedentechnologiedeveloperseclipsebezithetteam

worijbewijstechniekentomcatthevcazelfstandigarchitectwerklocatiehtml

Building rich skills profiles for thousands of occupations from millions of real time jobs…

… new trends and occupations…

Page 35: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014

Supply & Demand

• Have: lots of data, technology, ideas

• Want: labor market expertise, students, research

Page 36: Crawling, Parsing and Semantic Matching of Vacancies and CV’s Semantic Recruitment Technology Jakub Zavrel, Textkernel InGRID Workshop 11-2-2014

Semantic Recruitment Technology

Thanks!