1 Reuse of a repository of conceptual schemas in a large scale project Carlo Batini University of...

Preview:

Citation preview

1

Reuse of a repository of conceptual schemas

in a large scale project

Carlo BatiniUniversity of Milano Bicocca, Italy

batini@disco.unimib.it

2

Goal and contents • Use of an existing repository of schemas,

representing relevant info managed in Central Public administration.– 500 main databases – 500 conceptual schemas organized in a Repository– 5.000 entities and 10.000 attributes

• Produce the corresponding repository for a group of regional local administrations in Piedimont – 450 relational schema available– About 15.000 relational tables

• Human resources available– 2 person/years

• Heuristics and related methodology• Experiments• Recent developments

3

Organization of the Central and Local Public Administration in

Italy

4

Organization of Central and Local Public Administration in

Italy

• Central PA – 50 Ministeries and other Agencies

• Local PA– 21 Regions– More than 100 Provinces– More than 8.000 municipalities

5

Organization of the central PA Repository

6

An example of a repository in the small

FLOOR

DEP EMPMan

In Head

CITY

BornITEM ORD

EMP

SELLER

PUR

WARE

Loc..

In Of

Acq

ITEM

DEP EMP

CLERK ENGIN

WARR

Prod.

Head

Company SalesProduction

Born

CITY

Department structure

7

An example of a repository in the small

FLOOR

DEP EMPMan

In Head

CITY

BornITEM ORD

EMP

SELLER

PUR

WARE

Loc..

In Of

Acq

ITEM

DEP EMP

CLERK ENGIN

WARR

Prod.

Head

Company SalesProduction

Born

CITY

Department structure

integrationWARR

ITEM ORDER

FLOOR

DEPARTM EMPLOYEE CITY

SELLER

CLERK ENGINEER

GestLav.

PURCH

In

WARE

Loc

Man

In Of

of

Head

Born

Type

8

An example of a repository in the small

FLOOR

DEP EMPMan

In Head

CITY

BornITEM ORD

EMP

SELLER

PUR

WARE

Loc..

In Of

Acq

ITEM

DEP EMP

CLERK ENGIN

WARR

Prod.

Head

Company SalesProduction

ITEM ORDER

DEPART EMPLOYEE CITY

SElLER

Man

PURCHIn Of

Acq

Born

Born

ITEM

DEP D-E

In

EMP-DATA

ORD-DATA

ManAcq

CITY

Department structure

integration

abstraction

WARR

ITEM ORDER

FLOOR

DEPARTM EMPLOYEE CITY

SELLER

CLERK ENGINEER

GestLav.

PURCH

In

WARE

Loc

Man

In Of

of

Head

Born

Type

abstraction

9

An example of a repository in the small

DEP EMPMan

CITY

Born

DEP EMP-DATAD-E

FLOOR

DEP EMPMan

In Head

CITY

BornITEM ORD

EMP

SELLER

PUR

WARE

Loc..

In Of

Acq

ITEM ORD

EMPL

SELLER

PURIn Of

Acq

ITEM ORD-DATA

EMP-DATA

In

Acq

ITEM

DEP EMP

CLERK ENGIN

WARR

Prod.

Head

ITEM

DEP EMP-DATAD-E

ProducT

ITEM

DEPART EMPLOYEE

Product

Company SalesProduction

ITEM ORDER

DEPART EMPLOYEE CITY

SElLER

Man

PURCHIn Of

Acq

Born

Born

CITY

Born

ITEM

DEP D-E

In

EMP-DATA

ORD-DATA

ManAcq

CITY

Department structure

integration

view

view

abstraction

WARR

ITEM ORDER

FLOOR

DEPARTM EMPLOYEE CITY

SELLER

CLERK ENGINEER

GestLav.

PURCH

In

WARE

Loc

Man

In Of

of

Head

Born

Type

abstraction

10

Views not represented

FLOOR

DEP EMPMan

In Head

CITY

BornITEM ORD

EMP

SELLER

PUR

WARE

Loc..

In Of

Acq

ITEM

DEP EMP

CLERK ENGIN

WARR

Prod.

Head

Company SalesProduction

ITEM ORDER

DEPART EMPLOYEE CITY

SElLER

Man

PURCHIn Of

Acq

Born

Born

ITEM

DEP D-E

In

EMP-DATA

ORD-DATA

ManAcq

CITY

Department structure

integration

abstraction

WARR

ITEM ORDER

FLOOR

DEPARTM EMPLOYEE CITY

SELLER

CLERK ENGINEER

GestLav.

PURCH

In

WARE

Loc

Man

In Of

of

Head

Born

Type

abstraction

11

Only some abstractions represented

FLOOR

DEP EMPMan

In Head

CITY

BornITEM ORD

EMP

SELLER

PUR

WARE

Loc..

In Of

Acq

ITEM

DEP EMP

CLERK ENGIN

WARR

Prod.

Head

Company SalesProduction

Born

ITEM

DEP D-E

In

EMP-DATA

ORD-DATA

ManAcq

CITY

Department structure

integrationWARR

ITEM ORDER

FLOOR

DEPARTM EMPLOYEE CITY

SELLER

CLERK ENGINEER

GestLav.

PURCH

In

WARE

Loc

Man

In Of

of

Head

Born

Type

abstraction

12

Sparse approach

SI12345678

SI123 SI456 SI78

S1 S2 S3 S4 S5 S6 S7 S8

13

Structure of the Central PA Repository

Social security Justice Environment

Health

14

Structure of the Central PA Repository

Social security

Justice Environment

Health

Abstract Schemas

50

BasicSchemas

500

15

COMMUNICATION AND TRANSPORTSPRODUCTIONLABOUREDUCATIONHABITAT

BUILDINGCULTURESOCIAL HEALTHSECURITY JUSTICEDEFENCEFOREIGN AFFAIRS

SOCIALINSURANCECERTIFICATION

INTEGRATED DIAGRAM OF 1st LEVEL PA DATABASE

INTEGRATED DIAGRAM OF 2nd LEVEL PA DATABASE

INTEGRATED DIAGRAM OF 3rd LEVEL PA DATABASE

SERVICES

GENERAL SERVICES DIRECT SERVICESSOCIAL AND ECONOMIC SERVICES

LA

ND

RE

GIS

TR

Y

SO

CIA

L S

EC

UR

ITY

FO

RE

IGN

RE

LA

TIO

NS

IN

IT

AL

Y

ITA

LIA

N R

EL

AT

ION

S A

BR

OA

D

LE

GA

L A

CT

ITIT

IES

UR

BA

N C

RIM

INA

LIT

Y

INT

ER

NA

L S

EC

UR

ITY

AS

SIS

TA

NC

E

HA

EL

TH

SE

RV

ICE

CU

LT

UR

E

HA

BIT

AT

CU

LT

UR

AL

HE

RIT

AG

E

LA

BO

UR

MA

RK

ET

FA

RM

CO

MP

AN

IES

IND

US

TR

IAL

CO

MP

AN

IES

TR

AN

SP

OR

TS

SOCIAL SERVICES ECONOMIC SERVICES

FUN

D T

RA

NSF

ER

TO

LO

CA

L B

OD

IES

FOR

PU

BL

IC A

CT

IVIT

IES

EX

PEN

SES

CH

AP

TE

R

STATISTICSSUPPORTRESOURCES

FINANCIAL RESOURCES

INSTRUMENTAL AND REAL ESTATE RESOURCES

HUMAN RESOURCES

PRO

TO

CO

L

CO

LL

EC

TIV

E B

OD

Y

TA

X O

FFIC

E

CU

STO

MS

HO

USE

RESOURCES

INT

RU

ME

NT

S

MO

TO

R V

EH

ICL

ES

RE

AL

ES

TA

TE

EM

PLO

YE

ES

TR

AIN

ING

DE

LE

GA

TIO

NS

2/93

2/12

8/29

36/

693/

182

3/30

2/89

3/59 2/65

37/3

36

3/75

3/66

9/11

8

4/36

6/53 10

/76

6/7

66/

130 5/

566/

155 3/

134

8/21

3

10/1

00

9/11

8

3/53

9/11

2 10/1

78

The whole repository of schemas

16

Individual

Document

Legal person

Subject

Property

Place

The top level schema of the repository

17

Input knowledge for the production of the repository

of local conceptual schemas

Logical schemas

Conceptual schemas

Local Public Administration

Central Public Administration

Abstractschemas

Basic schemas

Repository of local Conceptual schemas

18

Conjecture (1) and strategy (2)

• 1. Knowledge appearing in the abstract schemas of the Central PA Repository should appear unchanged also in the Local PA Repository

• 2. Knowledge appearing in the basic schemas of the Central PA Repository should be changed/updated according to the knowledge appearing in the local logical schemas

19

Using a more compact representation

Abstractschemas

Basic schemas

Generalizationhierachies of

-Individual-Legal person-Document-Place -Property

20

A fragment of the generalization hierarchy for Individual

Individual Employment

Unemployed Employed Dependant AutonomousIn search of employment Retired

State pension retired Private pension retired Early retired Disability retired

Education ……..…

21

Input knowledge for the production of the Repository

of local conceptual schemas

Central Public Administration Local Public Administration

Conceptual schemas

Logical schemas

Abstractschemas

Basic schemas

Generalizationhierachies of -Individual-Legal person-Document-Place -Property

Repository of local Conceptual schemas

22

The two phases of the methodology

Automatic local schemaconstruction

Draftschema

Final schema

Manualstep

Domainexpert

23

The methodology at a glance

• Phase 1– 1. Extract entities– 2. Add generalizations– 3. Extract relatioships– 4. Add relationships related to integrity

constraints• Phase 2: Expert domain step

24

Step 1: Extract entities

• Inputs

Generalizationhierachies of

-Individual-Legal person-Document-Place -Property

Relational local PA schemas

Output

Draft schema

25

Step 1: Extract entities

…..Tables andattributes

Generalizationhierachies

26

Step 1: Extract entities

…..Tables andattributes

Generalizationhierachies

E1

27

Step 1: Extract entities

…..Tables andattributes

Generalizationhierachies

E1

E2

28

Step 1: Extract entities

…..Tables andattributes

Generalizationhierachies

E1

E2

E3

29

Step 1: Extract entities

Generalizationhierachies

E1

E2

E3

Tables andattributes

E1

E2

E3

…..

30

Step 2: Add generalizations

• Inputs

Generalizationhierachies of

-Individual-Legal person-Document-Place -Property

E1

E2

E3

Draft schema

Output

New draft schema

31

Add generalizations

Tables andattributes

E1

E2

E3

E1

E2

E3

…..

32

Step3: Extract relationships

• Inputs

E1

E2

E3

Draft schema

Social security Justice Environment Health

Basic schemas of the central PA repository

Output

New draft schema

33

Extract relationships

E1

E2

E3

34

Extract relationships

E2

E1

E2

E3

35

Extract relationships

E2 E1 E1

E1

E2

E3

36

Extract relationships

E2 E1 E3 E1 E3

E3

E2

E1

37

Extract relationships

E2 E1 E1 E3

E1

E2

E3

38

Step 4: Add relationships related to integrity constraints

• Inputs

E1

E2

E3

Draft schema

K3

K2

Referential integrity constraints

Output

Final draft schema

39

Add relationships related to integrity constraints

…..Tables andattributes

E1

E2

E3

K3

K2

E1

E2

E3

40

Experiments

41

Experiments on 9 databases in 3 areas

Domain/Type of administration

Region Province

Municipality

Territory x x xBusiness xHealth x

42

Relevant qualities of the process:correctness

• Correctness of the conceptual schema with respect to the “true” one, i.e. the schema that could be obtained directly by the domain expert through a traditional analysis or else a reverse engineering activity.

• Correcteness is measured with an approximate indirect metrics, corresponding to the percentage of new/deleted concepts in the schema produced by the expert at the end of step 5 in comparison with concepts produced in the semi automatic steps 1-4.

43

Relevant qualities of the process:completness

• Completeness of the conceptual schema with respect to the corresponding reengineered logical schema. Completeness is measured by the percentage of tables that are catched in steps 1-5, in comparison with the total number of tables, after excluding tables not carrying relevant information, such as redundant tables, tables of codes, etc.

44

Results

• Correctness: more than 80% • Completness: only 50% of tables are catched. • Completeness decreases significantly when the referential

integrity constraints are not documented or partially documented.

• Another cause of reduced completeness is the static nature of generalization hierarchies used in step 1, and the unequal semantic richness in representing related top level concepts.

• For instance, in the initial Subject hierarchy, 20 concepts represent individuals, while only 3 represent legal persons.

• An improvement we are applying concerns their incremental update with abstract concepts generated by the domain expert in the process

45

Resources

• For a basic/abstract schema of the central PA repository ½ person month

• For a basic schema of the local PA repository 1 person day

46

Present developments

47

Heuristics for abstract schemas

Level 1

Level 2

Level 3

Level 4

Initial schema

Enriched schema

48

Heuristics for abstract schemas - 1

Level 1

Level 2

Level 3

Level 4

Enriched schema

49

Heuristics for abstract schemas - 2

Level 1

Level 2

Level 3

Level 4

Enriched schema

50

Heuristics for abstract schemas - 3

Level 1

Level 2

Level 3

Level 4

Enriched schema

51

Heuristics for abstract schemas - 4

Level 1

Level 2

Level 3

Level 4

Enriched schema

52

Heuristics for abstract schemas - 5

Level 1

Level 2

Level 3

Level 4

Enriched schema

53

Heuristics for abstract schemas - 6

Level 1

Level 2

Level 3

Level 4

Enriched schema

54

Heuristic for abstract schemas - 7

Level 1

Level 2

Level 3

Level 4

Enriched schema

55

Individual

Italian citizen

DocumentBusiness

Registry act

Legal person Grant

Concession rule

Project budget

ProcedureSource

Canceled grant

Paid off grant

Awarded grant

Subject

Individual

Italian citizen

DocumentBusiness

Registry act

Legal person

Rule

Subject

Individual

Italian citizenDocument

Business

Legal person

Rule

Subject

Abstract schemas obtained from

the basic schema

56

Strategies ofr building abstract local schemas

Strategy 1: Abstraction step followed by an integration step

Strategy 2: Abstraction/integration performed together

Actual LPA repository Step 1 Step 2

Actual LPA repository

57

Leftover

58

The structure of the cooperative architecture

Basic services

Transport services

Basic services

Transport services

Administration 1

Processes

Administration 1

Processes

Exporteddata

Exportedservices

Internal applications

InternalDBs

Exporteddata

Exportedservices

Internal applications

InternalDBs

Administration 1

Processes

Administration 1

Processes

59

Experiments results

Step # of tables extracted

% of tables extracted

Create entities 172 30

Add constraints

219 41

Domain expert check

275 51

Recommended