1
0 50 100 150 200 250 300 350 0 2000 4000 6000 8000 10000 12000 14000 16000 Num ber ofEntries M odified N u m b e r o f U p d ate Information System Evolution Enabling Grids for E-sciencE EGEE-III INFSO-RI-222667 2170 LDAP LDAP_ADD LDAP_ADD LDAP_MODIFY Query Merge Update Provider Plugin LDIF New LDIF LDIF DIFF Update LDIF Query The information system is a mission-critical component of the EGEE production infrastructure. It provides the detailed information about Grid services which is required to discover, select and use them during Grid related activities such as job and data management. The information system components are found throughout the infrastructure, and are especially sensitive to the information volume and query rate. As such it must be ensured that current components can meet the scalability requirements due to the growth of the infrastructure. An improved Berkley Database Information Index (BDII) [1] architecture is presented that has the potential to meet these future requirements. The information changes in the information system were monitored by recording the modified entries during each BDII update. Over a period of 9 days the changes for 1932 update cycles were recorded, which corresponds to approximately one update cycle every 7 minutes. A graph of the number of changes per cycle can be seen above. The average number of entries modified per update cycle was 12771 which corresponds to 21.8% of the total number of entries. A further investigation was conducted to find out how often each attribute type was changed and the results can be found in the table above. 97.8% of the changes are confined to 14 attributes which is only 4% of the total attributes used. In the current implementation all the entries are transported and updated during each cycle, which is inefficient. The new architecture for the BDII consists of a standard LDAP database which is updated by an external process. The update process obtains LDIF from a number of sources and merges them. It then compares this to the contents of the database and creates an LDIF file of the differences. This is then used to update the database. The aim of this approach is to reduce complexity within the BDII and speed up the update cycle, therefore enabling more data to be handled in a given time period. This increased efficiency can be directly seen from viewing the graph below, which shows the once minute load average before and after upgrading from BDII v4 to BDII v5. With the information being inserted in to the resource BDIIs as modifications to the database, this opens up number of possibilities. One possibility is to use LDAP replication mechanisms to automatically propagate these changes to the higher levels in the system. This would be a possibility for the site level BDIIs and would reduce the latency between the update of the resource BDII and the site level BDII. Due to the use of the Freedom of Choice for Resources (FCR) [4] mechanism, it may not be possible to use LDAP replication technologies. To improve efficiency in this case a compressed content exchange mechanism could be employed or the FCR mechanism may need to be re-evaluated. The Glue[2] information model version 2.0 is an official recommendation from the Open Grid Forum [3]. It consolidates over 4 years of production experience with the Glue 1.x series. A common information model is required to facilitate interoperation between Grid infrastructures, and the definition of version 2.0 in an open forum will increase its adoption by other infrastructures. Migrating the EGEE information system from Glue 1.3 to 2.0 will occur in three stages. Firstly the information system will be updated to support both versions. Secondly the information providers will be updated to produce both 1.3 and 2.0 information. Finally, applications can start migrating from using version 1.3 to 2.0. Glue 1.3 information will only be removed once applications have migrated to version 2.0. User Domain Admin Domain Resource Manager Share End Point Activity Access Policy Mapping Policy Negotiates Share with Provides Manages Runs Defined on Contacts Maps User to Has Service GlueCEStateTotalJobs 9.41% GlueCEStateFreeCpus 9.52% GlueSAStateUsedSpace 5.38% GlueCEStateFreeJobslots 19.36% GlueCEStateWorstResponseTime 11.79% GlueSASateAvailableSpace 6.57% GlueCEStateEstimatedResponseT ime 12.50% GlueCEStateRunningJobs 7.90% GlueCEInfoTotalCpus 4.67% GlueCEStateWaitingJobs 6.37% GlueCEPolicyAssignedJobSlots 0.90% GlueServiceStartTime 0.71% GlueSAUsedOnlineSize 1.34% GlueSAFreeOnlineSize 1.37% 1 10 100 1000 10000 100000 1000000 S ep 03 M ar 04 S ep 04 Feb 05 A pr 05 S ep 05 D ec 05 M ar 06 O ct 06 D ec 06 M ar 07 A ug 07 Jun 08 N u m b e r o f c o res/jo b s No.C ores No.Sites No.Jobs The graph above shows that the rate of increase with respect to the number of sites joining the infrastructure is slowing; however, for the number of cores and jobs per day it is increasing. Assuming a growth rate of 50 sites per year, by 2015 there could potentially be 550 sites. Each new site would contribute more fundamental services, users and resources. Assuming an exponential growth rate for the number of cores and computing activities (jobs), by 2015 the number of cores in the EGEE infrastructure could reach 500,000 and the number of jobs per day could reach 2 million. References: References: Overview Overview BDII v5 BDII v5 Improved Performance! One minute load average before and after upgrading Future Directions Future Directions GLUE 2.0 GLUE 2.0 Has The growth of the number of sites, cores and jobs per day Infrastructure Growth Infrastructure Growth Investigation into the frequency of changes Investigation into the frequency of changes [1] http://twiki.cern.ch/twiki//bin/view/EGEE/BDII [2] http://forge.gridforum.org/sf/projects/glue-wg [3] http://www.ogf.org [4] https://lcg-fcr.cern.ch:8443/fcr/fcr.cgi Log Scale! M. W. Schulz and L. Field CERN-IT Authors: Authors: [email protected]

Information System Evolution

  • Upload
    yadid

  • View
    35

  • Download
    2

Embed Size (px)

DESCRIPTION

Information System Evolution. Enabling Grids for E-sciencE. - PowerPoint PPT Presentation

Citation preview

Page 1: Information System Evolution

0

50

100

150

200

250

300

350

0 2000 4000 6000 8000 10000 12000 14000 16000

Number of Entries Modified

Nu

mb

er

of

Up

date

Cycle

s

Information System Evolution

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667

2170LDAP

LDAP_ADD

LDAP_ADD

LDAP_MODIFY

Query

Merge

Update

Provider

Plugin

LDIF

New LDIF

LDIF DIFF Update LDIF

Query

The information system is a mission-critical component of the EGEE production infrastructure. It provides the detailed information about Grid services which is required to discover, select and use them during Grid related activities such as job and data management. The information system components are found throughout the infrastructure, and are especially sensitive to the information volume and query rate. As such it must be ensured that current components can meet the scalability requirements due to the growth of the infrastructure. An improved Berkley Database Information Index (BDII) [1] architecture is presented that has the potential to meet these future requirements.

The information changes in the information system were monitored by recording the modified entries during each BDII update. Over a period of 9 days the changes for 1932 update cycles were recorded, which corresponds to approximately one update cycle every 7 minutes. A graph of the number of changes per cycle can be seen above. The average number of entries modified per update cycle was 12771 which corresponds to 21.8% of the total number of entries. A further investigation was conducted to find out how often each attribute type was changed and the results can be found in the table above. 97.8% of the changes are confined to 14 attributes which is only 4% of the total attributes used. In the current implementation all the entries are transported and updated during each cycle, which is inefficient.

The new architecture for the BDII consists of a standard LDAP database which is updated by an external process. The update process obtains LDIF from a number of sources and merges them. It then compares this to the contents of the database and creates an LDIF file of the differences. This is then used to update the database. The aim of this approach is to reduce complexity within the BDII and speed up the update cycle, therefore enabling more data to be handled in a given time period. This increased efficiency can be directly seen from viewing the graph below, which shows the once minute load average before and after upgrading from BDII v4 to BDII v5.

With the information being inserted in to the resource BDIIs as modifications to the database, this opens up number of possibilities. One possibility is to use LDAP replication mechanisms to automatically propagate these changes to the higher levels in the system. This would be a possibility for the site level BDIIs and would reduce the latency between the update of the resource BDII and the site level BDII. Due to the use of the Freedom of Choice for Resources (FCR) [4] mechanism, it may not be possible to use LDAP replication technologies. To improve efficiency in this case a compressed content exchange mechanism could be employed or the FCR mechanism may need to be re-evaluated.

The Glue[2] information model version 2.0 is an official recommendation from the Open Grid Forum [3]. It consolidates over 4 years of production experience with the Glue 1.x series. A common information model is required to facilitate interoperation between Grid infrastructures, and the definition of version 2.0 in an open forum will increase its adoption by other infrastructures. Migrating the EGEE information system from Glue 1.3 to 2.0 will occur in three stages. Firstly the information system will be updated to support both versions. Secondly the information providers will be updated to produce both 1.3 and 2.0 information. Finally, applications can start migrating from using version 1.3 to 2.0. Glue 1.3 information will only be removed once applications have migrated to version 2.0.

User

Domain

Admin

Domain

Resource

Manager

ShareEnd Point

ActivityAccess

Policy

Mapping

Policy

Negotiates Share with

Provides

Manages

Runs

Defined on

Contacts

Maps User to

Has

Service

GlueCEStateTotalJobs 9.41%

GlueCEStateFreeCpus 9.52%

GlueSAStateUsedSpace 5.38%

GlueCEStateFreeJobslots 19.36%

GlueCEStateWorstResponseTime 11.79%

GlueSASateAvailableSpace 6.57%

GlueCEStateEstimatedResponseTime 12.50%

GlueCEStateRunningJobs 7.90%

GlueCEInfoTotalCpus 4.67%

GlueCEStateWaitingJobs 6.37%

GlueCEPolicyAssignedJobSlots 0.90%

GlueServiceStartTime 0.71%

GlueSAUsedOnlineSize 1.34%

GlueSAFreeOnlineSize 1.37%

1

10

100

1000

10000

100000

1000000

Sep03

Mar04

Sep04

Feb05

Apr05

Sep05

Dec05

Mar06

Oct06

Dec06

Mar07

Aug07

Jun08

Nu

mb

er

of

co

res

/jo

bs

/sit

es

No. Cores

No. Sites

No. Jobs

The graph above shows that the rate of increase with respect to the number of sites joining the infrastructure is slowing; however, for the number of cores and jobs per day it is increasing. Assuming a growth rate of 50 sites per year, by 2015 there could potentially be 550 sites. Each new site would contribute more fundamental services, users and resources. Assuming an exponential growth rate for the number of cores and computing activities (jobs), by 2015 the number of cores in the EGEE infrastructure could reach 500,000 and the number of jobs per day could reach 2 million.

References:References:

OverviewOverview BDII v5BDII v5

Improved Performance!

One minute load average before and after upgrading

Future DirectionsFuture Directions

GLUE 2.0GLUE 2.0

Has

The growth of the number of sites, cores and jobs per day

Infrastructure GrowthInfrastructure Growth

Investigation into the frequency of changesInvestigation into the frequency of changes

[1] http://twiki.cern.ch/twiki//bin/view/EGEE/BDII

[2] http://forge.gridforum.org/sf/projects/glue-wg

[3] http://www.ogf.org

[4] https://lcg-fcr.cern.ch:8443/fcr/fcr.cgi

Log Scale!

M. W. Schulz and L. Field CERN-ITAuthors:Authors:

[email protected]