View
220
Download
2
Category
Tags:
Preview:
Citation preview
1
Reuse of a repository of conceptual schemas
in a large scale project
Carlo BatiniUniversity of Milano Bicocca, Italy
batini@disco.unimib.it
2
Goal and contents • Use of an existing repository of schemas,
representing relevant info managed in Central Public administration.– 500 main databases – 500 conceptual schemas organized in a Repository– 5.000 entities and 10.000 attributes
• Produce the corresponding repository for a group of regional local administrations in Piedimont – 450 relational schema available– About 15.000 relational tables
• Human resources available– 2 person/years
• Heuristics and related methodology• Experiments• Recent developments
3
Organization of the Central and Local Public Administration in
Italy
4
Organization of Central and Local Public Administration in
Italy
• Central PA – 50 Ministeries and other Agencies
• Local PA– 21 Regions– More than 100 Provinces– More than 8.000 municipalities
5
Organization of the central PA Repository
6
An example of a repository in the small
FLOOR
DEP EMPMan
In Head
CITY
BornITEM ORD
EMP
SELLER
PUR
WARE
Loc..
In Of
Acq
ITEM
DEP EMP
CLERK ENGIN
WARR
Prod.
Head
Company SalesProduction
Born
CITY
Department structure
7
An example of a repository in the small
FLOOR
DEP EMPMan
In Head
CITY
BornITEM ORD
EMP
SELLER
PUR
WARE
Loc..
In Of
Acq
ITEM
DEP EMP
CLERK ENGIN
WARR
Prod.
Head
Company SalesProduction
Born
CITY
Department structure
integrationWARR
ITEM ORDER
FLOOR
DEPARTM EMPLOYEE CITY
SELLER
CLERK ENGINEER
GestLav.
PURCH
In
WARE
Loc
Man
In Of
of
Head
Born
Type
8
An example of a repository in the small
FLOOR
DEP EMPMan
In Head
CITY
BornITEM ORD
EMP
SELLER
PUR
WARE
Loc..
In Of
Acq
ITEM
DEP EMP
CLERK ENGIN
WARR
Prod.
Head
Company SalesProduction
ITEM ORDER
DEPART EMPLOYEE CITY
SElLER
Man
PURCHIn Of
Acq
Born
Born
ITEM
DEP D-E
In
EMP-DATA
ORD-DATA
ManAcq
CITY
Department structure
integration
abstraction
WARR
ITEM ORDER
FLOOR
DEPARTM EMPLOYEE CITY
SELLER
CLERK ENGINEER
GestLav.
PURCH
In
WARE
Loc
Man
In Of
of
Head
Born
Type
abstraction
9
An example of a repository in the small
DEP EMPMan
CITY
Born
DEP EMP-DATAD-E
FLOOR
DEP EMPMan
In Head
CITY
BornITEM ORD
EMP
SELLER
PUR
WARE
Loc..
In Of
Acq
ITEM ORD
EMPL
SELLER
PURIn Of
Acq
ITEM ORD-DATA
EMP-DATA
In
Acq
ITEM
DEP EMP
CLERK ENGIN
WARR
Prod.
Head
ITEM
DEP EMP-DATAD-E
ProducT
ITEM
DEPART EMPLOYEE
Product
Company SalesProduction
ITEM ORDER
DEPART EMPLOYEE CITY
SElLER
Man
PURCHIn Of
Acq
Born
Born
CITY
Born
ITEM
DEP D-E
In
EMP-DATA
ORD-DATA
ManAcq
CITY
Department structure
integration
view
view
abstraction
WARR
ITEM ORDER
FLOOR
DEPARTM EMPLOYEE CITY
SELLER
CLERK ENGINEER
GestLav.
PURCH
In
WARE
Loc
Man
In Of
of
Head
Born
Type
abstraction
10
Views not represented
FLOOR
DEP EMPMan
In Head
CITY
BornITEM ORD
EMP
SELLER
PUR
WARE
Loc..
In Of
Acq
ITEM
DEP EMP
CLERK ENGIN
WARR
Prod.
Head
Company SalesProduction
ITEM ORDER
DEPART EMPLOYEE CITY
SElLER
Man
PURCHIn Of
Acq
Born
Born
ITEM
DEP D-E
In
EMP-DATA
ORD-DATA
ManAcq
CITY
Department structure
integration
abstraction
WARR
ITEM ORDER
FLOOR
DEPARTM EMPLOYEE CITY
SELLER
CLERK ENGINEER
GestLav.
PURCH
In
WARE
Loc
Man
In Of
of
Head
Born
Type
abstraction
11
Only some abstractions represented
FLOOR
DEP EMPMan
In Head
CITY
BornITEM ORD
EMP
SELLER
PUR
WARE
Loc..
In Of
Acq
ITEM
DEP EMP
CLERK ENGIN
WARR
Prod.
Head
Company SalesProduction
Born
ITEM
DEP D-E
In
EMP-DATA
ORD-DATA
ManAcq
CITY
Department structure
integrationWARR
ITEM ORDER
FLOOR
DEPARTM EMPLOYEE CITY
SELLER
CLERK ENGINEER
GestLav.
PURCH
In
WARE
Loc
Man
In Of
of
Head
Born
Type
abstraction
12
Sparse approach
SI12345678
SI123 SI456 SI78
S1 S2 S3 S4 S5 S6 S7 S8
13
Structure of the Central PA Repository
Social security Justice Environment
Health
14
Structure of the Central PA Repository
Social security
Justice Environment
Health
Abstract Schemas
50
BasicSchemas
500
15
COMMUNICATION AND TRANSPORTSPRODUCTIONLABOUREDUCATIONHABITAT
BUILDINGCULTURESOCIAL HEALTHSECURITY JUSTICEDEFENCEFOREIGN AFFAIRS
SOCIALINSURANCECERTIFICATION
INTEGRATED DIAGRAM OF 1st LEVEL PA DATABASE
INTEGRATED DIAGRAM OF 2nd LEVEL PA DATABASE
INTEGRATED DIAGRAM OF 3rd LEVEL PA DATABASE
SERVICES
GENERAL SERVICES DIRECT SERVICESSOCIAL AND ECONOMIC SERVICES
LA
ND
RE
GIS
TR
Y
SO
CIA
L S
EC
UR
ITY
FO
RE
IGN
RE
LA
TIO
NS
IN
IT
AL
Y
ITA
LIA
N R
EL
AT
ION
S A
BR
OA
D
LE
GA
L A
CT
ITIT
IES
UR
BA
N C
RIM
INA
LIT
Y
INT
ER
NA
L S
EC
UR
ITY
AS
SIS
TA
NC
E
HA
EL
TH
SE
RV
ICE
CU
LT
UR
E
HA
BIT
AT
CU
LT
UR
AL
HE
RIT
AG
E
LA
BO
UR
MA
RK
ET
FA
RM
CO
MP
AN
IES
IND
US
TR
IAL
CO
MP
AN
IES
TR
AN
SP
OR
TS
SOCIAL SERVICES ECONOMIC SERVICES
FUN
D T
RA
NSF
ER
TO
LO
CA
L B
OD
IES
FOR
PU
BL
IC A
CT
IVIT
IES
EX
PEN
SES
CH
AP
TE
R
STATISTICSSUPPORTRESOURCES
FINANCIAL RESOURCES
INSTRUMENTAL AND REAL ESTATE RESOURCES
HUMAN RESOURCES
PRO
TO
CO
L
CO
LL
EC
TIV
E B
OD
Y
TA
X O
FFIC
E
CU
STO
MS
HO
USE
RESOURCES
INT
RU
ME
NT
S
MO
TO
R V
EH
ICL
ES
RE
AL
ES
TA
TE
EM
PLO
YE
ES
TR
AIN
ING
DE
LE
GA
TIO
NS
2/93
2/12
8/29
36/
693/
182
3/30
2/89
3/59 2/65
37/3
36
3/75
3/66
9/11
8
4/36
6/53 10
/76
6/7
66/
130 5/
566/
155 3/
134
8/21
3
10/1
00
9/11
8
3/53
9/11
2 10/1
78
The whole repository of schemas
16
Individual
Document
Legal person
Subject
Property
Place
The top level schema of the repository
17
Input knowledge for the production of the repository
of local conceptual schemas
Logical schemas
Conceptual schemas
Local Public Administration
Central Public Administration
Abstractschemas
Basic schemas
Repository of local Conceptual schemas
18
Conjecture (1) and strategy (2)
• 1. Knowledge appearing in the abstract schemas of the Central PA Repository should appear unchanged also in the Local PA Repository
• 2. Knowledge appearing in the basic schemas of the Central PA Repository should be changed/updated according to the knowledge appearing in the local logical schemas
19
Using a more compact representation
Abstractschemas
Basic schemas
Generalizationhierachies of
-Individual-Legal person-Document-Place -Property
20
A fragment of the generalization hierarchy for Individual
Individual Employment
Unemployed Employed Dependant AutonomousIn search of employment Retired
State pension retired Private pension retired Early retired Disability retired
Education ……..…
21
Input knowledge for the production of the Repository
of local conceptual schemas
Central Public Administration Local Public Administration
Conceptual schemas
Logical schemas
Abstractschemas
Basic schemas
Generalizationhierachies of -Individual-Legal person-Document-Place -Property
Repository of local Conceptual schemas
22
The two phases of the methodology
Automatic local schemaconstruction
Draftschema
Final schema
Manualstep
Domainexpert
23
The methodology at a glance
• Phase 1– 1. Extract entities– 2. Add generalizations– 3. Extract relatioships– 4. Add relationships related to integrity
constraints• Phase 2: Expert domain step
24
Step 1: Extract entities
• Inputs
Generalizationhierachies of
-Individual-Legal person-Document-Place -Property
Relational local PA schemas
Output
Draft schema
25
Step 1: Extract entities
…..Tables andattributes
Generalizationhierachies
26
Step 1: Extract entities
…..Tables andattributes
Generalizationhierachies
E1
27
Step 1: Extract entities
…..Tables andattributes
Generalizationhierachies
E1
E2
28
Step 1: Extract entities
…..Tables andattributes
Generalizationhierachies
E1
E2
E3
29
Step 1: Extract entities
Generalizationhierachies
E1
E2
E3
Tables andattributes
E1
E2
E3
…..
30
Step 2: Add generalizations
• Inputs
Generalizationhierachies of
-Individual-Legal person-Document-Place -Property
E1
E2
E3
Draft schema
Output
New draft schema
31
Add generalizations
Tables andattributes
E1
E2
E3
E1
E2
E3
…..
32
Step3: Extract relationships
• Inputs
E1
E2
E3
Draft schema
Social security Justice Environment Health
Basic schemas of the central PA repository
Output
New draft schema
33
Extract relationships
E1
E2
E3
34
Extract relationships
E2
E1
E2
E3
35
Extract relationships
E2 E1 E1
E1
E2
E3
36
Extract relationships
E2 E1 E3 E1 E3
E3
E2
E1
37
Extract relationships
E2 E1 E1 E3
E1
E2
E3
38
Step 4: Add relationships related to integrity constraints
• Inputs
E1
E2
E3
Draft schema
K3
K2
Referential integrity constraints
Output
Final draft schema
39
Add relationships related to integrity constraints
…..Tables andattributes
E1
E2
E3
K3
K2
E1
E2
E3
40
Experiments
41
Experiments on 9 databases in 3 areas
Domain/Type of administration
Region Province
Municipality
Territory x x xBusiness xHealth x
42
Relevant qualities of the process:correctness
• Correctness of the conceptual schema with respect to the “true” one, i.e. the schema that could be obtained directly by the domain expert through a traditional analysis or else a reverse engineering activity.
• Correcteness is measured with an approximate indirect metrics, corresponding to the percentage of new/deleted concepts in the schema produced by the expert at the end of step 5 in comparison with concepts produced in the semi automatic steps 1-4.
43
Relevant qualities of the process:completness
• Completeness of the conceptual schema with respect to the corresponding reengineered logical schema. Completeness is measured by the percentage of tables that are catched in steps 1-5, in comparison with the total number of tables, after excluding tables not carrying relevant information, such as redundant tables, tables of codes, etc.
44
Results
• Correctness: more than 80% • Completness: only 50% of tables are catched. • Completeness decreases significantly when the referential
integrity constraints are not documented or partially documented.
• Another cause of reduced completeness is the static nature of generalization hierarchies used in step 1, and the unequal semantic richness in representing related top level concepts.
• For instance, in the initial Subject hierarchy, 20 concepts represent individuals, while only 3 represent legal persons.
• An improvement we are applying concerns their incremental update with abstract concepts generated by the domain expert in the process
45
Resources
• For a basic/abstract schema of the central PA repository ½ person month
• For a basic schema of the local PA repository 1 person day
46
Present developments
47
Heuristics for abstract schemas
Level 1
Level 2
Level 3
Level 4
Initial schema
Enriched schema
48
Heuristics for abstract schemas - 1
Level 1
Level 2
Level 3
Level 4
Enriched schema
49
Heuristics for abstract schemas - 2
Level 1
Level 2
Level 3
Level 4
Enriched schema
50
Heuristics for abstract schemas - 3
Level 1
Level 2
Level 3
Level 4
Enriched schema
51
Heuristics for abstract schemas - 4
Level 1
Level 2
Level 3
Level 4
Enriched schema
52
Heuristics for abstract schemas - 5
Level 1
Level 2
Level 3
Level 4
Enriched schema
53
Heuristics for abstract schemas - 6
Level 1
Level 2
Level 3
Level 4
Enriched schema
54
Heuristic for abstract schemas - 7
Level 1
Level 2
Level 3
Level 4
Enriched schema
55
Individual
Italian citizen
DocumentBusiness
Registry act
Legal person Grant
Concession rule
Project budget
ProcedureSource
Canceled grant
Paid off grant
Awarded grant
Subject
Individual
Italian citizen
DocumentBusiness
Registry act
Legal person
Rule
Subject
Individual
Italian citizenDocument
Business
Legal person
Rule
Subject
Abstract schemas obtained from
the basic schema
56
Strategies ofr building abstract local schemas
Strategy 1: Abstraction step followed by an integration step
Strategy 2: Abstraction/integration performed together
Actual LPA repository Step 1 Step 2
Actual LPA repository
57
Leftover
58
The structure of the cooperative architecture
Basic services
Transport services
Basic services
Transport services
Administration 1
Processes
Administration 1
Processes
Exporteddata
Exportedservices
Internal applications
InternalDBs
Exporteddata
Exportedservices
Internal applications
InternalDBs
Administration 1
Processes
Administration 1
Processes
59
Experiments results
Step # of tables extracted
% of tables extracted
Create entities 172 30
Add constraints
219 41
Domain expert check
275 51
Recommended