Upload
sivan
View
40
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Inventor Mobility Index. Thorsten Doherr Zentrum für Europäische Wirtschaftsforschung Center of Economic Research, Mannheim Germany. Mission. Problem:. Two inventors with the same name are not neccessarily the same person - PowerPoint PPT Presentation
Citation preview
Inventor Mobility Index
Thorsten DoherrZentrum für Europäische Wirtschaftsforschung
Center of Economic Research, MannheimGermany
Two inventors with the same name are not neccessarily the same person Defining an inventor only by its name results in too much false mobility
especially for inventors with common names Restricting the definition too much (i.e.: name and home address) will
cancel any mobility
You have to decide wether two patents from inventors with the same name are actually from the same person or from different persons that share the same name
Mission:
The complete patent data
Problem:
Tools:
Mission
if they are inventing for the same applicantif they have the same home addressif they are working with the same co-inventorsif one is citing the otherif they have patents in the same area of technology (ipc)
Two inventors with the same name are the same person…
Plausibility Rules
Inventor:A single inventor entry in a patent document
Person:All inventors with a specific name that are linked by at least one plausibility
rule
Harmonization of Applicants
The SearchEngine is an in-house developed software package specialized in company address matching. It implements the following steps: Normalizing of the search fields (company name, address fields) by
transforming them to uppercase, replacing special letters to their common (phonetic) representation (i.e.: Ü UE, ß SS), compressing abbreviations (i.e.: S.P.A. SPA) and replacing special characters with blanks
Creating a dictionary containing all the words of the search fields along with their occurrence. To preserve the context, every search field has its own chapter. The occurence is the base for the heuristic search algorithm. There are also supporting tables that link the dictionary entries back to the company table.
The search algorithm separates a search term into words. Each word is associated with the occurrence counter of the appropriate dictionary entry. The occurrence reflects the identification potential of the word. A low occurrence has a high identity, because the resulting list of potential hits is small.
SearchEngine
ENTRY OCCURS IDENTITY
… … …
CORPORATION 16 1/16 = 0.062500
… … …
ITALIA 491 1/491 = 0.002037
… … …
LEAR 4 1/4 = 0.250000
… … …
SPA 6119 1/6119 = 0.000163
DICTIONARY - Chapter: APPLICANT_NAME
Lear Corporation ITALIA S.p.A.
LEAR CORPORATION ITALIA SPA SUM0.250000 0.062500 0.002037 0.000163 .3147000
79.441% 19.860% 0.647% 0.052% 100%
NAME IDENTITY
LEAR CORPORATION ITALIA S.p.A. 100.000%
Lear Corporation Italia S.r.l. 99.947%
LEAR ITALIA SEATING S.p.A. 80.139%
Searching for…
Result
Example of the SearchEngine Algorithm
Harmonization of Applicants
Harmonization of Applicants
The resulting list of matching pairs is not symmetric: A can be linked to B but it is not required that B is linked to A linked pairs create a network
Network Analysis: if A is linked to B and B is linked to C, the analysis identifies the group A,B,C
Re-iteration of the network analysis for too large groups with an increased cutoff limit for their members.
Finalization A cutoff limit for the identity is applied to filter all results (i.e. 90%)
Creating phonetic representations of the name using the Metaphone algorithm by Lawrence Philips, 1990
Phonetic algorithms create unique representations for similar sounding words (names) and can be indexed direct database access
Originally the results they delivered were manually validated because of their strong tendency for false positives automated matching requires an automated validation process
Harmonization of Inventor Names
Automated comparison of the retrieved names with the searched name The function is based on the least relative character position deltas and
requires two words as parameters can not be used for index based direct access
Needs phonetic indexing to quickly generate a list of potential candidates Tolerance for typing errors increases with the length of the words longer
words are more prone to typing errors
The SearchEngine is of limited use because… it is most efficient with search terms consisting of multiple words the main problem are typing errors and misspellings
Harmonization of Inventor Names
MR BRTNMAURO BARATONI
MARIO BERRETTONI
MARIO BERTINI
MARIO BERTON
MAURO BERTONI
MAURO BORDIN
FIRST NAME LAST NAME
Example for the Metaphone Search
Harmonization of Inventor Names
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1.0C Z A R N I T Z K I
C H A R N I Z K I0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1.0
0 0.63 0.02 0.041 0.05 0.069 1.0 0.02 0.013 0∆ + + + + + + + + += = 1.875
= 1-
Example for the Least Relative Character Position Deltas
if they are inventing for the same applicant.if they have the same home address.if they are working with the same co-inventors.if one citing the other.if they have patents in the same area of technology (ipc).
Two inventors with the same name are the same person…
Plausibility Rules
Inventor:A single inventor entry in a patent document.
Person:All inventors with a specific name that are linked by at least one plausibility
rule.
All Patents of an Inventor Name
1
2
3
4
5
78
6
9
1011
12
1415
1716
18
1920 21
13
22
The Same Applicant Rule
1
2
3
4
5
78
6
9
1011
12
1415
1716
18
1920 21
13
22
The Same Home Address Rule
1
2
3
4
5
78
6
9
1011
12
1415
1716
18
1920 21
13
22
The Co-Inventor Rule
1
2
3
4
5
78
6
9
1011
12
1415
1716
18
1920 21
13
22
The Citation Rule
1
2
3
4
5
78
6
9
1011
12
1415
1716
18
1920 21
13
22
The IPC Rule
1
2
3
4
5
78
6
9
1011
12
1415
1716
18
1920 21
13
22
Italian Inventor Mobility Indexpatents from Italian applicants and inventors
different harmonized inventor names
nodes after applying the same applicant rule
nodes after applying the co-inventor rule
nodes after applying the citation rule
123356
49101
60268
nodes after applying the same home address rule53316
53572
52504
50276 nodes after applying the ipc rule
Espace Bulletin (March 2010), EPO
Patstat (September 2010), OECD
Main Database:
Citations:
Development: Microsoft Visual FoxPro 9.0
FROM TO
… …
1 2
1 5
2 1
2 5
2 7
5 1
5 2
6 7
7 2
7 6
… …
Traversal of a Network Table
1
2
3
4
5
78
6
GROUP MEMBER
1 1
1 2
1 5
1 7
1 6