35
Deployment Session David Kelsey GridPP13, Durham 5 Jul 2005 [email protected]

Deployment Session David Kelsey GridPP13, Durham 5 Jul 2005 [email protected]

Embed Size (px)

Citation preview

Deployment SessionDavid Kelsey

GridPP13, Durham5 Jul 2005

[email protected]

5-Jul-05 Deployment Board 2

Introduction

• Not a formal DB meeting– DB last met on 1st/2nd June 2005 (Glasgow)– Will next meet on 14th/15th Sep 2005

(Dublin)

• BUT.. Chance to discuss some of the current issues

• Many of the topics also covered elsewhere on GridPP13 agenda– Try not to duplicate discussions there

5-Jul-05 Deployment Board 3

Agenda

• Introductions• Technical Documentation• Security Policy and Procedures• LCG and gLite releases and deployment• Deployment Metrics – Get fit actions• Storage issues• Tier 2 deployment and operations• Other issues

5-Jul-05 Deployment Board 4

Documentation

• Strong requirement for good documentation– User guides– Sys Admin guides– Web pages– Etc

• Progress to date not bad (but slow)• We need someone to drive this• Oversight Committee agrees• Is anyone interested/able to do this?• Or should we recruit?

5-Jul-05 Deployment Board 5

Security Policy and Procedures

• User and VO AUP• Incident Response• Lessons from recent ssh key incident• Security Vulnerability policy

INFSO-RI-508833

Enabling Grids for E-sciencE

www.eu-egee.org

Update on LCG/EGEE Security Policy and ProceduresDavid Kelsey, CCLRC/RAL, [email protected]

LCG GDB Meeting,CERN, 22 June 2005

Deployment Board 7

Enabling Grids for E-sciencE

INFSO-RI-508833

Overview

Work of the Joint (LCG/EGEE) Security Policy Group– In collaboration with US Open Science Grid (OSG)– JSPG meeting – CERN – 13/14 June 2005

• User Acceptable Use Policy– Not yet in EDMS

• VO Security Policy (and AUP)– https://edms.cern.ch/document/ 573348/

• Incident Handling and Response Guide– https://edms.cern.ch/document/428035/

Deployment Board 8

Enabling Grids for E-sciencE

INFSO-RI-508833

User AUP (Version: The Taipei Accord 29 April 2005)

USER AGREEMENT (accepted during registration with a VO)

1) You may only perform work, or transmit or store data consistent with the activities and policies of the Virtual Organizations of which you are a member, and only on resources authorized for use by those Virtual Organizations.

2) You will not attempt to circumvent administrative or security controls on the use of resources. If you are informed that some aspect of your grid usage is creating a problem, you will adjust your usage and investigate ways to resolve the complaint.

Deployment Board 9

Enabling Grids for E-sciencE

INFSO-RI-508833

User AUP (2)

3) You will immediately report any suspected compromise of your grid credentials or suspected misuse of grid resources to incident reporting locations specified by the Virtual Organization(s) affected and credential issuing authorities as specified in their agreements and policy statements.

4) You are aware that resource providers have the right to regulate access as they deem necessary for either operational or security-related reasons and that your use of the Grid is also bound by the rules and policies of the organizations through which you obtain access, e. g. your home institute, your national network and/or your internet service provider(s).

Deployment Board 10

Enabling Grids for E-sciencE

INFSO-RI-508833

User AUP – Discussion since 18 May

• Sent to GDB and ROC Managers for comment• Approved by OSG Council on 31 May 2005• Comments received (mainly on bullet 4)

– What about resource provider policy?– What about Grid/Infrastructure policy?– Do we have the right to cut-off users?

Legal status of “Service” providers?

– What about Data Protection laws?– Style: I am/will versus You are/will

• Discussed issues at JSPG meeting 14 June– Decided to consult some legal experts– Feedback received from one site and one network– Expecting another site feedback soon

Deployment Board 11

Enabling Grids for E-sciencE

INFSO-RI-508833

AUP feedback to date

• Site legal advice– Current draft text not sufficient– Rules not binding unless users aware of them

Must be pointers to all rules Doesn’t matter if too long to read

• As long as they have the opportunity– Bullet 4 does not give us the right to control access– Data protection needs to be addressed if personal info shared– Must state that users register every 12 months

• NREN response– Looks good approach– Similar to work on location independent networking project– Perhaps move towards single AUP for common “visiting user” policy?– Bound by home site and home network rules

Must respect others and cease activity when requested– Need to make clear what is allowed and what not

Then can control access Access can be limited to one application (tested in law)

Deployment Board 12

Enabling Grids for E-sciencE

INFSO-RI-508833

AUP conclusion

• Not yet ready for GDB approval– BUT further comments very welcome

• Awaiting feedback from another site lawyer• JSPG needs to discuss the way forward

– Remembering that OSG has already approved

• Will come back to GDB and ROC managers asap

Deployment Board 13

Enabling Grids for E-sciencE

INFSO-RI-508833

VO Security Policy

• Draft document – presented at last GDB– Author: Ian Neilson

• https://edms.cern.ch/document/ 573348/• No comments received

– Except for internal JSPG discussion

• Made clear that security contact point must be a single e-mail address

• Recent discussion (not concluded) on VO AUP text– Binding users to Grid/Infrastructure Policy or not?– What do users need to read, be able to read, be aware of?

• Depends on final decision on User AUP• So again, not ready for approval yet• BUT… comments welcome!

Deployment Board 14

Enabling Grids for E-sciencE

INFSO-RI-508833

New Incident Response

• Based on work by Open Science Grid• We use the OSG document “as is”

– But with a covering document explaining differences

• https://edms.cern.ch/document/428035• Presented to GDB for first time today

– Then period of discussion– Ask for feedback from ROC Managers and OSCT

• Aim for approval at next GDB

Deployment Board 15

Enabling Grids for E-sciencE

INFSO-RI-508833

OSG Document

• Describes policy and procedures• Sites MUST report security incidents

– In addition to normal reporting to CERT, CSIRT

• Handling of sensitive data– Public disclosure via PR offices– National/International coordination done by law enforcement

• Security Contacts must be registered for each site– Maintained by GOC

• Mail list is also group of experts to provide advice• Mail lists: Report and Discuss• Defines the Incident Reporting process

– Discovery, Analysis, Classification, Containment, Notification, Escalation, Response, Post-incident analysis

• Volunteer response team created if needed

Deployment Board 16

Enabling Grids for E-sciencE

INFSO-RI-508833

LCG/EGEE covering document

• Intended audience– Site Security Contacts and System Administrators

• Defines mail lists for LCG/EGEE• Warns that Incident Response info may be shared with

other Grids (where agreements exist)• Team leader to coordinate response

– Initially organised by site reporting and its local ROC security contact

– ROC contact responsible to make sure that process happens

Deployment Board 17

Enabling Grids for E-sciencE

INFSO-RI-508833

Sharing of Incident Info

• In many cases it will be important to share Incident information between Grids

• May happen informally via sites which belong to more than one Grid

• Formal agreements will be needed– Where Grids follow the same/similar policy and procedures– But only where reciprocal agreement

• JSPG keen to arrange reciprocal agreement with OSG• Also need to consider national Grids

– ROC responsibility so job here for OSCT?

Deployment Board 18

Enabling Grids for E-sciencE

INFSO-RI-508833

Summary

• More work needed on AUP and VO Security Policy– Will come back to GDB when ready

• Inviting discussion on Incident Response document– Approval at next GDB?

Deployment Board 19

Enabling Grids for E-sciencE

INFSO-RI-508833

Useful Links

• Meetings - Agenda, presentations, minutes etc

http://agenda.cern.ch/displayLevel.php?fid=68• JSPG Web site

http://proj-lcg-security.web.cern.ch/• Policy documents at

http://cern.ch/proj-lcg-security/documents.html

5-Jul-05 Deployment Board 20

LCG, gLite releases

• Deployment planning• Ian Bird’s talk on Monday

– More frequent releases

• Conflicts with SC3, SC4 etc• UK feedback to CERN deployment?

5-Jul-05 Deployment Board 21

Metrics & Plans

• See JC talk on Monday

5-Jul-05 Deployment Board 22

Lessons from recent ssh incident

• Dteam user trying to run MPI jobs copied ssh keys

• Lots of discussion on LCG-Rollout and TB-Support– How do I blacklist a user certificate?– How do we get the user’s certificate revoked?– Is this an incident?– Why doesn’t LCG security team urgently investigate?– Will we see a full report on lessons learned?– Need good advice for sys admins to avoid incidents?– There is no DTEAM AUP – why running MPI jobs?– How does sys admin contact the user?– There is no infrastructure for dealing with real

incidents?

5-Jul-05 Deployment Board 23

Incident response lessons

• How do I blacklist a user certificate?– Use Local Authorization– LCAS configuration– Or remove from Grid mapfile

• But update will reinsert until removed by VO

• How do we get the user’s certificate revoked?– Contact the CA or RA– But in general they will not revoke unless allowed by

their policy (e.g. proof of compromised private key)– Instead contact VO (via ROC) to remove Authorization

5-Jul-05 Deployment Board 24

Lessons (2)

• Is this an incident?• Current Incident Response Policy definition

– Any security investigation that causes a site to interrupt service

• ie. disconnect a machine or bar a user

– Any instance of suspected misuse of grid resources beyond the local site

– There is a reasonable possibility that credentials have been stolen and those credentials will not expire or be revoked within 3 days of the possible theft.

5-Jul-05 Deployment Board 25

Lessons (3)

• New Incident Response Policy definition– An incident is any real or suspected event

that poses a real or potential threat the integrity of services, resources, infrastructure, or identities. • Grid participants MUST report incidents that

have known or potential impact or relationship to Grid resources, services, or identities.

• Grid participants MUST respond to incidents involving locally managed or operated resources, services, or identities.

5-Jul-05 Deployment Board 26

Lessons (4)

Incident Classification (new policy)

High: (team leader required)• The incident could lead to exploitation of the trust fabric, i.e

user and host identities, or the incident could lead to instability of the overall Grid, or a denial-of-service is in progress against all replicas of a given Grid service.

Medium: (team leader required if widespread)• The incident affects an instance of a Grid service, but Grid

stability is not at risk, or a denial-of-service affects one replica of a given Grid service, or a local attack compromised a privileged user account.

Low: (team leader probably not required)• A local attack comprised individual user, non-privileged

credentials, or a denial-of-service attack or compromise affects only local grid resources.

5-Jul-05 Deployment Board 27

Lessons (5)

• Why doesn’t LCG security team urgently investigate?– There is no such LCG team– Security is responsibility of the ROC’s

• Coordinated by OSCT

• Will we see a full report on lessons learned?– Ian Neilson sent to the list 3 days later– He concluded should have been discussed on “security

contacts”

• Need good advice for sys admins to avoid incidents?– Agreed– Romain W is leading activity (RSS feeds)– Volunteers needed!

5-Jul-05 Deployment Board 28

Lessons (6)

• There is no DTEAM AUP – why running MPI jobs?– Agreed– This is a requirement of the new policy

• How does sys admin contact the user?– Via ROC to VO– E-mail notification of all user registrations– Requirement for read-only access to VO

database• Must respect privacy

5-Jul-05 Deployment Board 29

Lessons (7)

• There is no infrastructure for dealing with real incidents?– May not be perfect – but policy requires reporting to “csirt” list (as well as local

reporting)– New policy requires creation of team to deal with incident

• Responsibility of ROC (OSCT)– Not appropriate to discuss on LCG-rollout or TB-Support

• OSCT needs to decide how best to communicate with Sys Admins– Current approach is via “Security Contacts” list– My personal view: we need an emergency sysadmin mail

list

5-Jul-05 Deployment Board 30

Security Vulnerability

• Linda Cornwall will present tomorrow• There has been lots of discussion re policy

– Discuss these now

• Everyone agrees aim– To protect our sites, resources and data– Improve quality of middleware and deployment

• Concerns about– Legal liability– Do we “publish” vulnerabilities (if so, when?)– Developers will not fix unless we publish– How do we keep information private before

publishing?

5-Jul-05 Deployment Board 31

Current model

• An internal LCG/EGEE/GridPP activity– No responsibility to outside customers

• Legal responsibility (if any) is SA1 and JRA1• Risk assessment done quickly• Inform OSCT and developers quickly

– OSCT/Deployment team can inform all sites if necessary• Or should the group inform all sites quickly (on closed list)?

• Allow time for fix (45 days?)• When problem fixed or on timeout

– Inform all sites but never fully publish• JRA1 and/or SA1 can publish if they wish

• Status reports (stats) available on web (with access control)– Report regularly to Management (JRA1 and SA1)

• Mirror entries in JRA1 Savannah• Any site admin can join group

– But must abide by policy

5-Jul-05 Deployment Board 32

Alternative model

• Proposed by Romain W (see slides)– Support from some sys admins

• More like an external activity– Similar to CERT/CC

• After quick risk analysis– Inform developers and/or deployment

team– Do not inform sites (info will leak out)

• After timeout– Full publication (responsible disclosure)

5-Jul-05 Deployment Board 33

Decision?

• LCG, EGEE, GridPP management will decide the approach– PEB, PMB etc

• General EGEE agreement – Athens and Brno meetings– Internal activity so cannot fully publish– Responsibility for informing internal and external customers rests

on SA1 (OSCT/ROCs) and JRA1

• Feedback welcome to inform this– When/how do sites need to be informed?

• If this approach does not produce results we should retain right to change to

• Important to get this right• BUT even more important to get on with fixing problems

– Volunteers needed for Risk Analysis

5-Jul-05 Deployment Board 34

Storage issues

• dCache and DPM– (Short) discussion later on agenda

• Open Source policy– dCache unlikely to be Open Source– Lots of discussion on TB-Support– GridPP policy is to write Open Source s/w– Arguments why we should only use

OpenSource– UK evaluation/review of dCache?

5-Jul-05 Deployment Board 35

Tier 2 Deployment &

operations• 24*7 service• Accounting• VO support• Resource delivery• Permanent storage• How do sites get back in if blacklisted?• Communication

– Sites & ROCs <-> Experiments