18
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/ HEPiX Spring 2013 Report Luis FERNANDEZ ALVAREZ (IT-OIS)

HEPiX Spring 2013 Report

  • Upload
    ling

  • View
    23

  • Download
    0

Embed Size (px)

DESCRIPTION

HEPiX Spring 2013 Report. Luis FERNANDEZ ALVAREZ (IT-OIS). Security and Networking. Overview. Total of 7 talks in Wednesday session Spring2011 (4)2012 (6) Fall 2011 (7) 2012 (11) Topics: IPv6 Network monitoring Identity management Security DNS HA Institutes & Others: - PowerPoint PPT Presentation

Citation preview

Page 1: HEPiX Spring 2013 Report

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

HEPiX Spring 2013Report

Luis FERNANDEZ ALVAREZ

(IT-OIS)

Page 2: HEPiX Spring 2013 Report

SECURITY AND NETWORKING

Security and Networking - 2

Page 3: HEPiX Spring 2013 Report

Overview

• Total of 7 talks in Wednesday sessionSpring2011 (4) 2012 (6)

Fall 2011 (7) 2012 (11)

• Topics:– IPv6– Network monitoring

– Identity management– Security– DNS HA

• Institutes & Others:– INFN, Indiana University, CNAF, University of Michigan,

BrightLite Information Security, STFC, CERN.

Security and Networking - 3

Page 4: HEPiX Spring 2013 Report

IPv6 (I)

TalksThe HEPiX IPv6 working group

The last year of IPv6 HEPiX test bed operation

•IPv4 exhaustion, it will also hit CERN–Adoption of IPv6 for applications and WLCG

•Have IPv6-only WNs by 2014–Big challenge => more resources (people + equipment)

•HEPiX distributed test bed in progress–LHC Experiments on board: CMS, LHCb, ALICE & ATLAS–Configuration validation: DNS entries, ping6,…–Simple data transfers between all nodes simultaneously

•2013–Include all Tier 1, decide services running dual stack, testing (volunteers),…

Security and Networking - 4

Page 5: HEPiX Spring 2013 Report

IPv6 (II)

• Technologies used in the testbed:– Globus/GridFTP, UberFTP, glibc, DHCPv6, RAs…

• The workgroup highlights some issues:– IPv6 seems still not to be taken seriously when it comes

to writing/patching software.– Tracking and keeping traction on issues resolution has

proved to be slow and expensive– Protocol specification mesh, minor infrastructure rethink

Security and Networking - 5

Page 6: HEPiX Spring 2013 Report

Network Monitoring

TalkWLCG Network Monitoring using perfSONAR-PS

•perfSONAR network monitoring framework–Provide a single source of network metrics to WLCG

–Targeting full deployment at all Tier-2 sites (150 locations)

•perfSONAR-PS deployment based on 2 instances–Central mesh configuration for node and tests to perform

–Measurements between WLCG resources: latency, bandwidth, traceroute.

•Modular dashboard as a central source of info

•How to exploit all the metrics?

Security and Networking - 6

Page 7: HEPiX Spring 2013 Report

Identity Management

TalksIdentity Federation – HEP/WLCG

eXtreme Scale Identity Management (XSIM) for Scientific Collaborations

•Identity Federation (FIM4R), use cases–Web-based grid portals for job submission.–CLI-based job submission.

•Working group: proof-of-concept pilot service for CLI–WLCG resources through home-issued credentials–ECP, CILogon–CLI tool working, promising results

•Blocking issues–ECP profile is not widely supported among IdPs–No further work on current pilot based on ECP

•Discussion on trust, LoA, Web use case, ECP alternatives…

Security and Networking - 7

Page 8: HEPiX Spring 2013 Report

Identity Management (II)

• XSIM: trustworthy extreme-scale scientific collaborations– Understand and formalize a model of IdM

• Method– Understand the core elements of trust relationship– How those relations would be expressed in IdM?– Model validation, software and applied research

• Interviews with VOs and RPs– Constitution of VOs, relation with RPs and users, threats,

policies, lessons learned…

• No results in raw format– Unstructured format, privacy– Next target: publication of results at CHEP/eScience

Security and Networking - 8

Page 9: HEPiX Spring 2013 Report

Security

TalkSecurity Update

•Botnets & SSH attacks–Citadel (Man-in-the-browser) compromised 11730 hosts –Ebury sshd Trojan: compiled for each victim, stealth,…

•WLCG–Risk Analysis: misused identities, attack propagation, OS vulnerabilities–Incidents 10-12 per year (2012 quieter than usual)

•Paradigm shift–Control mechanisms: from local users to federation with good reputation sites–Security perimeters: from ensuring attackers to never run any kind of arbitrary code to allow it in an isolated VM environment

•VMs isolation impossible & traditional mechanisms obsolete–Traceability for containing, resolving and preventing issues–Store and analyze data

Security and Networking - 9

Page 10: HEPiX Spring 2013 Report

DNS HA

TalkDNS multi master architecture for High Availability services

•Resilient DNS, guarantee full functionality–Modify IP of a service during the downtime of a site hosting one authoritative DNS

•Nagios + BIND9-DLZ + MySQL (circular replication)–BIND9 Limitations: Reads data from text files, stores it in memory, changes implies reloading

•It permits to change the IP addresses even if one of the sites is down

Security and Networking - 10

Page 11: HEPiX Spring 2013 Report

GRID, CLOUD AND VIRTUALIZATION

Grid, cloud and virtualization - 11

Page 12: HEPiX Spring 2013 Report

Overview

• Total of 5 talks in Thursday sessionSpring2011 (8) 2012 (6)

Fall 2011 (8) 2012 (7)

• Topics:– IaaS– Federated Clouds

• Institutes & Others:– FermiLab, RAL, STFC, CERN.

Grid, cloud and virtualization - 12

Page 13: HEPiX Spring 2013 Report

IaaS (I)

TalksVirtualisation Cloud Computing at the RAL Tier 1

The CMS openstack, opportunistic, overlay, online-cluster CloudFermiGrid and FermiCloud Updates

Vmcaster and Vmcatcher

•Virtualization @ RAL (Hyper-V Platform)–~200VMs over 3 years (provisioning more responsive)–HA shared storage (minimize important data on VMs)–Not wedded to Hyper-V

•Prototype StratusLab private cloud (~300 cores)–WiP: Federation, Evaluating CEPH & Integrate the cloud into Tier 1–Rethink the platform itself

•WLCG: dynamically-provisioned worker nodes–Tested with: HTCondor & Slurm

Grid, cloud and virtualization - 13

Page 14: HEPiX Spring 2013 Report

IaaS (II): CMS

• Cloud Architecture – 1 Controller + 1 Image Store + 1300 Compute nodes– 1000 VMs running stable 3 weeks / 250VMs in ~ 5min

• Network virtualization– No physical reconfiguration / encapsulations overhead– Logical cloud network schema easier to understand

• Work in progress– Integration as a CERN’s GRID resource– Migration to Grizzly (HA)– Increase network bandwidth to CERN Tier0 (40GBit/s)

• Highlights– Zero impact on data taking– ~1.5FTE for ~6months

Grid, cloud and virtualization - 14

Page 15: HEPiX Spring 2013 Report

IaaS (III)

• FermiGrid– FermiGrid-HA(2)– WiP: IPv6, SHA-2, merge services into FermiCloud

• FermiCloud based on OpenNebula– Distributed fault tolerance highly desirable– VM Acceptance process (security probes, sandbox,…)– Idle VM detection & suspension when needed– Virtualized MPI results: >96% of the bare metal

performance

• Starting… – Interoperability and federation– Reevaluate OpenStack and other stacks

Grid, cloud and virtualization - 15

Page 16: HEPiX Spring 2013 Report

IaaS (IV)

• vmcaster and vmcatcher– Repository for VM images (rpm/deb style)

• Deployments– OpenStack @ CC-IN2P3– OpenNebula @ CESGA– EGI Federated Cloud Task Force

• Contextualisation based on archiveoverlays– Images are public,

overlays are private

• Status– Vmcatcher work needed– Vmcaster production

Grid, cloud and virtualization - 16

Page 17: HEPiX Spring 2013 Report

Federated Cloud

TalkEGI Federated Cloud Infrastructure status

•Create an uniform access to the cloud

–Federation system agnostic about a particular stack

•Standard interfaces: –OCCI, CDMI,…

•30 active individuals in FedCloud VO

Grid, cloud and virtualization - 17

Page 18: HEPiX Spring 2013 Report

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

HEPiX Spring 2013Report

Thanks