44
Francesco Minafra ALICE offline week - CERN, 24 February 2005 1 Experiences with QUATTOR in the upgrade of the Bari’s Farm F. Minafra Physics Dept. & I.N.F.N. - Unversity of BARI ALICE offline week CERN, 24 February 2005

Experiences with QUATTOR in the upgrade of the Bari’s Farm

  • Upload
    shepry

  • View
    40

  • Download
    0

Embed Size (px)

DESCRIPTION

F. Minafra Physics Dept. & I.N.F.N. - Unversity of BARI ALICE offline week CERN, 24 February 2005. Experiences with QUATTOR in the upgrade of the Bari’s Farm. Summary. Problems concerning the administration of middle-sized farms Automated installation/configuration tools: quattor - PowerPoint PPT Presentation

Citation preview

Page 1: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 1

Experiences with QUATTOR in the

upgrade of the Bari’s Farm

F. MinafraPhysics Dept. & I.N.F.N. - Unversity of

BARI

ALICE offline weekCERN, 24 February 2005

Page 2: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 2

Summary

● Problems concerning the administration of middle-sized farms

● Automated installation/configuration tools: quattor

● Practical Example: CDB+SWREP+AII+Templates ● Todo list

Page 3: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 3

ALICE farm - BARI● Existing machines (used in the ALICE PDC ‘04)

– 1 Disk Server + 4 Worker nodes (10 cpu) – RH7.3

● New machines– 2 Disk Servers + 5 Worker nodes (14 cpu)

– possible a synergic work with the Bari’s FINUDA group (1 Disk Server + 1 Worker Node)

● Maintenance & security updates

● A uniform setup on all these machines (old and new ones) with Scientific Linux CERN should be ideal for an efficient farm, both for local and GRID job execution.

● Administration work starts to be heavier!

Page 4: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 4

QUATTOR● QUattor is an Administration ToolkiT for Optimizing Resources

– http://quattor.web.cern.ch/quattor/● multi-protocol, DB of configurations (actual/wanted), modular,

Linux/Solaris platforms

● EGEE maybe wants to choose this component for automating installations on testbeds (some config. template were ‘ported’ from LCFGng)

● Useful for maintenance of medium to large sites (used by CERN CC for management of most Linux nodes)

● Alternative for small-medium sites: (kickstart, manual config)

● Not only servers, but also profiles for Workstations!

Page 5: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 5

quattor components

● CDB – Configuration DataBase

– hosts the wanted/actual configuration of the nodes

● SWREP – Software Repository

– Operating System Packages

– Update Packages

– Local software packages (E.G. Root, AliRoot, GEANT)

● AII – Automated Installation Infrastructure– deals with nodes installation ‘from scratch’

– uses SWREP to deliver installation packages

● Configuration Components act as ‘agents’ on the local nodes

Page 6: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 6

Installation Example scenario

● Only two machines

– CDB+SWREP+AII server ‘alicegrid4’

● eth0: 193.206.185.207● eth1: 192.168.1.10● Web + DHCP + TFTP + PXELinux

● acts also as a Gateway

– client node ‘alien9’● two network cards with PXEboot● only one used for installation● sits on a ‘private’ network - eth0: 192.168.1.9 assigned by DHCP

● Only Operating System Installation

Page 7: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 7

Installation Example: preliminary setup

● APT repository setup:

[alicegrid4]$ cat /etc/apt/sources.list.d/quattor.listrpm http://quattorsw.web.cern.ch/quattorsw/software/quattor apt/1.0.0/i386 quattor_sl303

From the FAQ:

Q: "When installing quattor 1.0.0 with apt-get on the CERN version of

Scientific Linux 3.0.3, I experience RPM conflicts."

A: Please check this note:

http://cern.ch/quattorsw/software/quattor/apt/1.0.0/NOTES-ScientificLinuxCERN-3.03.txt

[root@alicegrid4 root]# rpm -e --nodeps edg-perl-LC edg-ccm edg-caf-perl ncm-ncd

● Remove the packages that create the conflict (newer versions will be installed from the quattorsw repository as dependencies)

Page 8: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 8

Installation Example: preliminary setup

● Install dependency packages:

[root@alicegrid4 root]# apt-get install ncm-ncd=1.1.18-1[root@alicegrid4 root]# apt-get install ncm-query=1.0.8-1[root@alicegrid4 root]# apt-get install ncm-spma=1.2.7-1

● Install quattor client:

[root@alicegrid4 root]# apt-get install httpd mod_ssl[root@alicegrid4 root]# chkconfig --add httpd[root@alicegrid4 root]# /sbin/service httpd start

● make sure the APACHE web server is correctly installed and active:

[root@alicegrid4 root]# apt-get install quattor-client

● The web server should be accessible only from the private network,so we modify the default configuration to read:

[root@alicegrid4 root]# vim /etc/httpd/conf/httpd.conf...# Listen: Allows you to bind Apache to specific IP addresses and/or# ports, in addition to the default. See also the <VirtualHost># directive.Listen 192.168.1.0:80...

Page 9: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 9

Installation Example: CDB (Configuration DataBase) setup

[root@alicegrid4 root]# apt-get install quattor-cdb

● This component creates a new user that we can use for performing configuration tasks on CDB

[root@alicegrid4 root]# cat /etc/passwd...fminafra:x:500:500:Francesco Minafra:/home/fminafra:/bin/bashnamed:x:25:25:Named:/var/named:/sbin/nologincdb:x:501:501:quattor CDB user:/home/cdb:/bin/bash[root@alicegrid4 root]#[root@alicegrid4 root]# passwd cdb

● We configure the database in order to store its data and its logfile in a different location than the default:

[root@alicegrid4 root]# vim /etc/cdb.conf...# Configuration DataBase location#top /var/lib/cdb/top /home/cdb/data/hld hld/lld lld/...# log file name#log_filename /var/log/cdb.loglog_filename /home/cdb/cdb.log...

Page 10: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 10

Installation Example: CDB (Configuration DataBase) setup

● then initialize the database:

[root@alicegrid4 root]# cdb-setup

● The profiles should be accessible from the web (in the private network)so we create a symbolic link:

[root@alicegrid4 data]# ll /var/www/html/totale 0lrwxrwxrwx 1 root root 22 28 gen 11:05 profiles -> /home/cdb/data/lld/xml

● We used the 'local' client for administering the CDB (requires local access to the CDB server) athough a remote administration tool (cdbop) is also available. But...

[root@alicegrid4 root]# su - cdb[alicegrid4] /home/cdb > cdb-simple-cli --listUncaught exception!!! Calling stack is: LC::Exception::throw_error called at /usr/lib/perl5/site_perl/CAF/Log.pm line 156 CAF::Log::_initialize called at /usr/lib/perl5/site_perl/CAF/Object.pm line 75 CAF::Object::new called at /usr/lib/perl/EDG/WP4/CDB/Common.pm line 366 (eval) called at /usr/lib/perl/EDG/WP4/CDB/Common.pm line 0...*** Open for append: /home/cdb/cdb.logUncaught exception!!! Calling stack is:*** cannot instantiate class: CAF::Log

Page 11: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 11

Installation Example: CDB (Configuration DataBase) setup

● Ouch! We got unexpected errors!

[root@alicegrid4 root]# cd /home/cdb/[root@alicegrid4 cdb]# lltotale 4-rw-r--r-- 1 root root 0 28 gen 11:05 cdb.logdrwxrwxr-x 4 cdb cdb 4096 28 gen 11:05 data

● Obviously we could not write to logfile as it is owned by root! We fixed this manually and submitted a bug report to Savannah.

[root@alicegrid4 cdb]# chown cdb:cdb cdb.log

● Now it works:[alicegrid4] /home/cdb > cdb-simple-cli --list[INFO] listing templates...

● But the template Database is still empty!

Page 12: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 12

Installation Example: SWREP (Software Repository) setup

● This component is used by a Software Package Management Agent, and provides access control and consistency checks for RPM repositories. If it sits on the same server used to automate the installation of nodes (AII server) then it delivers also packages for OS installation.

● We install it using again the quattor apt repository:

[root@alicegrid4 root]# apt-get install quattor-swrep

● it adds a new user 'swrep', but without a shell:

[root@alicegrid4 root]# tail /etc/passwd...fminafra:x:500:500:Francesco Minafra:/home/fminafra:/bin/bashnamed:x:25:25:Named:/var/named:/sbin/nologincdb:x:501:501:quattor CDB user:/home/cdb:/bin/bashswrep:x:33475:33475::/var/swrep:/usr/sbin/swrep-server

● The physical place where the packages will be stored was choosen conveniently:

[root@alicegrid4 root]# mkdir /home/swrep[root@alicegrid4 root]# chown swrep:swrep /home/swrep

Page 13: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 13

Installation Example: SWREP (Software Repository) setup

● and it should be accessible from the web by the private network, so we created a symbolic link:

[root@alicegrid4 root]# cp /usr/share/doc/swrep-server-1.2.31/swrep-server.conf /etc/swrep/[root@alicegrid4 swrep]# vim /etc/swrep/swrep-server.conf...# * Name of the repository (BE CAREFUL! Used to generate a template name!)name = 'alicegrid bari'# * email of the owner of the repository (example: [email protected])owner = [email protected]# * list of repository access URL'surl = http://192.168.1.10/swrep# * where to put the SWRep server log filelogfile = /home/cdb/swrep.log# * root directory of the repository on the local file systemrootdir = /var/www/html/swrep...[root@alicegrid4 root]# cp /usr/share/doc/swrep-client-1.2.31/swrep-client.conf /etc/swrep/[root@alicegrid4 root]# vim /etc/swrep/swrep-client.conf...# * repository <string>: repository location (in user@host format)repository = [email protected]...

● We configured the component using the template that comes with the package itself:

[root@alicegrid4 root]# ln -s /home/swrep /var/www/html/swrep

Page 14: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 14

Installation Example: SWREP (Software Repository) setup

● The repository has an authentication system based on ssh keys:

[alicegrid4] /home/fminafra > ssh-keygen -t dsaGenerating public/private dsa key pair.Enter file in which to save the key (/home/fminafra/.ssh/id_dsa):Enter passphrase (empty for no passphrase): *********Enter same passphrase again: *********Your identification has been saved in /home/fminafra/.ssh/id_dsa.Your public key has been saved in /home/fminafra/.ssh/id_dsa.pub.The key fingerprint is:92:84:d7:af:4b:58:87:03:e2:8a:0e:8c:37:ae:ab:67 fminafra@alicegrid4

[alicegrid4] /home/fminafra > cat /home/fminafra/.ssh/id_dsa.pub >> authorized_keys

[root@alicegrid4 root]# vim /etc/ssh/sshd_config ...PermitUserEnvironment yes...[root@alicegrid4 root]# vim /etc/swrep/swrep.acl# Quattor SPM SWRep ACL file## For more information, see the swrep-server (1) man page.fminafra:/

Page 15: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 15

Installation Example: SWREP (Software Repository) setup

● And for administration is convenient to start an ssh agent:

[alicegrid4] /home/fminafra > ssh-agent $SHELL[alicegrid4] /home/fminafra > ssh-addEnter passphrase for /home/fminafra/.ssh/id_dsa:Identity added: /home/fminafra/.ssh/id_dsa (/home/fminafra/.ssh/id_dsa)

[alicegrid4] /home/fminafra > swrep-client listrightsYou are fminafra, with rights to change packages with tags: /You have repository administrator rights

[alicegrid4] /home/fminafra > swrep-client addplatform i386_sl3Platform i386_sl3 successfully added

[alicegrid4] /home/fminafra > swrep-client addarea i386_sl3 /baseArea /base successfully created in platform i386_sl3

[alicegrid4] /home/fminafra > swrep-client addarea i386_sl3 /updatesArea /updates successfully created in platform i386_sl3

[alicegrid4] /home/fminafra > swrep-client addarea i386_sl3 /extraArea /extra successfully created in platform i386_sl3

Page 16: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 16

Installation Example: SWREP (Software Repository) setup

● We populated the software repository first with the content of the Scientific Linux installation CD-ROMs

[root@alicegrid4 root]# cd /var/www/html/swrep/i386_sl3/

[root@alicegrid4 root]# mount /mnt/cdrom

[root@alicegrid4 i386_sl3]# cp /mnt/cdrom/SL/RPMS/* .

● (same procedure for all the 4 CDs)

[root@alicegrid4 i386_sl3]# rm TRANS.TBL

● (it is not a repository package! Leads to errors!)[alicegrid4] /home/fminafra > swrep-client bootstrap i386_sl3 /base4Suite-0.11.1-14.i386.rpm /basea2ps-4.13b-28_SL.i386.rpm /base...Package database for platform i386_sl3 successfully bootstrapped[alicegrid4] /home/fminafra >[alicegrid4] /home/fminafra > swrep-client listareas i386_sl3Available areas for platform i386_sl3: /base 1401 /extra 0 /updates 0

Page 17: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 17

Installation Example: SWREP (Software Repository) setup

● Now we should populate also the /updates and /extra areas: the first should contain all the security updates issued for Scientific Linux since the CD-ROMs were released, and the second any non OS software, such as the quattor-client packages, useful for client nodes.

● First the slc3.0.3 updates were wget-ted to a temporary directory, and then all the packages were copied in the repository:

[root@alicegrid4 tmp]# mv * /home/swrep/i386_sl3

[alicegrid4] /home/fminafra > swrep-client bootstrap i386_sl3 /updates...Package database for platform i386_sl3 successfully bootstrapped

[alicegrid4] /home/fminafra > swrep-client listareas i386_sl3Available areas for platform i386_sl3: /base 1401 /extra 0 /updates 545

Page 18: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 18

Installation Example: SWREP (Software Repository) setup

● also the quattor-client packages were wget-ted in a temporary directory:

[alicegrid4] /home/fminafra/quattor_client > lltotal 580-rw-rw-r-- 1 fminafra fminafra 66645 Nov 26 15:19 ccm-1.4.0-1.noarch.rpm-rw-rw-r-- 1 fminafra fminafra 12532 Nov 26 15:20 cdp-listend-1.0.0-1.noarch.rpm-rw-rw-r-- 1 fminafra fminafra 14439 Nov 26 15:21 ncm-accounts-2.0.4-1.noarch.rpm-rw-rw-r-- 1 fminafra fminafra 16196 Nov 26 15:20 ncm-cdispd-1.0.1-1.noarch.rpm-rw-rw-r-- 1 fminafra fminafra 9224 Nov 26 15:21 ncm-cron-1.0.7-1.noarch.rpm-rw-rw-r-- 1 fminafra fminafra 8750 Nov 26 15:21 ncm-grub-1.2.8-1.noarch.rpm-rw-rw-r-- 1 fminafra fminafra 7978 Nov 26 15:21 ncm-interactivelimits-1.0.2-1.noarch.rpm-rw-rw-r-- 1 fminafra fminafra 41990 Nov 26 15:20 ncm-ncd-1.1.18-1.noarch.rpm-rw-rw-r-- 1 fminafra fminafra 8000 Nov 26 15:21 ncm-ntpd-1.0.3-1.noarch.rpm-rw-rw-r-- 1 fminafra fminafra 9997 Nov 26 15:20 ncm-query-1.0.8-1.noarch.rpm-rw-rw-r-- 1 fminafra fminafra 14639 Nov 26 15:21 ncm-spma-1.2.7-1.noarch.rpm-rw-rw-r-- 1 fminafra fminafra 37012 Nov 26 15:20 perl-AppConfig-caf-1.3.7-1.noarch.rpm-rw-rw-r-- 1 fminafra fminafra 63956 Nov 26 15:20 perl-CAF-1.3.7-1.noarch.rpm-rw-rw-r-- 1 fminafra fminafra 38782 Nov 23 11:04 perl-Compress-Zlib-1.16-12.i386.rpm-rw-rw-r-- 1 fminafra fminafra 104882 Nov 26 15:20 perl-LC-1.0.6-1.noarch.rpm-rw-rw-r-- 1 fminafra fminafra 15669 Feb 9 18:59 rpmt-2.0.2-1.i386.rpm-rw-rw-r-- 1 fminafra fminafra 70895 Nov 26 15:20 spma-1.9.16-1.noarch.rpm

Page 19: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 19

Installation Example: SWREP (Software Repository) setup

● and repeated the bootstrap procedure.

[alicegrid4] /home/fminafra/quattor_client > suPassword:[alicegrid4] /home/fminafra/quattor_client > cp * /home/swrep/i386_sl3/

[alicegrid4] /home/fminafra/quattor_client > swrep-client bootstrap i386_sl3 /extra...Package database for platform i386_sl3 successfully bootstrapped

[alicegrid4] /home/fminafra/quattor_client > swrep-client listareas i386_sl3Available areas for platform i386_sl3: /base 1401 /extra 16 /updates 545

WARNING! using often bootstrap is a dangerous procedure.

The quattor team is working to a safe mechanism

that avoids errors that may corrupt the SWREP database

Page 20: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 20

Installation Example: AII (Automated Installation Infrastructure) setup

● This component needs a dhcp server installed with a basic configuration already set up, and a tftp server. AII uses the TFTP protocol to download the kernel loader (PXELinux) through the network. Furthermore the PXELinux (which is part of syslinux) should be already installed:[root@alicegrid4 root]# rpm -qa | grep syslinuxsyslinux-2.06-0.3E

[root@alicegrid4 root]# route -nKernel IP routing tableDestination Gateway Genmask Flags Metric Ref Use Iface193.206.185.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eth10.0.0.0 193.206.185.1 0.0.0.0 UG 0 0 0 eth0

● First we note that our server machine was already configured as a gateway between two networks (two network interfaces):

● dhcpd is installed but not started:

[root@alicegrid4 root]# rpm -qa | grep dhcpdhcp-3.0.1-10_EL3

[root@alicegrid4 root]# chkconfig --list | grep dhcpdhcpd 0:off 1:off 2:off 3:off 4:off 5:off 6:off

Page 21: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 21

Installation Example: AII (Automated Installation Infrastructure) setup

● So we install the aii component and configure dhcpd:[root@alicegrid4 root]# apt-get install quattor-aii[root@alicegrid4 root]# cp /usr/share/doc/aii-1.0.11/eg/dhcpd.conf /etc[root@alicegrid4 root]# vim /etc/dhcpd.conf# dhcpd.conf ISC DHCP 2.0 configuration file# Uncommnent this line if ISC DHCP ver. 3ddns-update-style ad-hoc;# write here your network nameshared-network netcic {# deny unknown-clients; not authoritative; # Write here your domain name option domain-name "netcic.cic"; # Parameters for the installation via PXE using pxelinux filename "pxelinux.0"; # Uncommnent this line if ISC DHCP ver. 3 option vendor-class-identifier "PXEClient"; option vendor-encapsulated-options 01:04:00:00:00:00:ff; # Complete with (at least) the gateway + DNS. # Hosts entries will be inserted # automatically by AII in this section subnet 192.168.1.0 netmask 255.255.255.0 { option routers 192.168.1.10; option domain-name-servers 192.168.1.10; option subnet-mask 255.255.255.0;}}

Page 22: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 22

Installation Example: AII (Automated Installation Infrastructure) setup

● We set up dhcpd in order to listen for broadcasts only on the eth1 interface (the one on the private network) and started the daemon

[root@alicegrid4 root]# vim /etc/sysconfig/dhcpd# Command line options hereDHCPDARGS=eth1

[root@alicegrid4 root]# chkconfig --level 345 dhcpd on[root@alicegrid4 root]# service dhcpd start

● Next we configured the tftp-server, which is controlled by the xinetd superserver (both packages were installed):

[alicegrid4] /home/fminafra > /usr/sbin/in.tftpd -Vtftp-hpa 0.39, with remap, with tcpwrappers

● We noted that tftpd was compiled with tcpwrappers enabled, so it needs to authenticate each host that tries to connect against the /etc/hosts.allow and /etc/hosts.deny files; so we customized these files:

Page 23: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 23

Installation Example: AII (Automated Installation Infrastructure) setup

[root@alicegrid4 root]# vim /etc/hosts.deny## hosts.deny This file describes the names of the hosts which are# *not* allowed to use the local INET services, as decided# by the '/usr/sbin/tcpd' server.## The portmap line is redundant, but it is left to remind you that# the new secure portmap uses hosts.deny and hosts.allow. In particular# you should know that NFS uses portmap!

in.tftpd : ALL

[root@alicegrid4 root]# vim /etc/hosts.allow## hosts.allow This file describes the names of the hosts which are# allowed to use the local INET services, as decided# by the '/usr/sbin/tcpd' server.#

in.tftpd : 192.168.1.

[root@alicegrid4 root]# service xinetd restart

Page 24: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 24

Installation Example: AII (Automated Installation Infrastructure) setup

[alicegrid4] /home/fminafra > cat /etc/xinetd.d/tftp... server_args = -s /osinstall/nbp...

● Also, the quattor-aii package modifies the default working directory of the tftp-server:

● So by default, the directory that TFTP will use to download the kernel loader is /osinstall/nbp.

● The PXELinux configuration is done by copying the Linux boot kernel and the initrd image from the OS installation disk into /osinstall/nbp/sl3:[root@alicegrid4 root]# mkdir /osinstall/nbp/sl3

[root@alicegrid4 root]# mount /mnt/cdrom

[root@alicegrid4 root]# cp /mnt/cdrom/images/pxeboot/vmlinuz /osinstall/nbp/sl3[root@alicegrid4 root]# cp /mnt/cdrom/images/pxeboot/initrd.img /osinstall/nbp/sl3

Page 25: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 25

Installation Example: AII (Automated Installation Infrastructure) setup

[root@alicegrid4 root]# ln -s /osinstall/ks /var/www/html/ks

● KS files generated by AII will be stored at /osinstall/ks, and this area should also be available to the web server:

● The contents of the slc3.0.3 installation CD-ROM must be accessible using the HTTP protocol as well.

● AII server is the same than the SWRep server so we can re-use the SWREP packages by creating a symbolic link:

[root@alicegrid4 root]# mkdir -p /var/www/html/sl3/SL[root@alicegrid4 root]# ln -s /var/www/html/swrep/i386_sl3/ /var/www/html/sl3/SL/RPMS

[root@alicegrid4 root]# mount /mnt/cdrom[root@alicegrid4 root]# cp -r /mnt/cdrom/SL/base /var/www/html/sl3/SL/[root@alicegrid4 root]# cp -r /mnt/cdrom/SL/build /var/www/html/sl3/SL/

Page 26: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 26

Installation Example: AII (Automated Installation Infrastructure) setup

[root@alicegrid4 root]# cp /usr/sbin/aii-installack.cgi /var/www/cgi-bin

● When a client node finishes to install itself, it sends a notification message to the AII server. In this way, the AII server can change the conguration of the node to boot from the local disk the next time it reboots. If the client fails to send the notication message, it will get re-installed during the next reboot. To avoid endless reinstalls we have to copy the acknowledgment cgi-binscript to our Apache web server cgi-bin directory:

● configure the sudo utility in such a way that apache user can run AII commands without providing passwords. [root@alicegrid4 root]# visudo# sudoers file.## This file MUST be edited with the 'visudo' command as root.## See the sudoers man page for the details on how to write a sudoers file.#apache alicegrid4=(ALL) NOPASSWD: /usr/sbin/aii-shellfe#this reads: the apache user on the host alicegrid4 can run aii-shellfe with the#privileges of ALL users without being asked for his or any other password.

Page 27: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 27

Installation Example: AII (Automated Installation Infrastructure) setup

[root@alicegrid4 root]# cp /usr/share/doc/aii-1.0.11/eg/aii-shellfe.conf /etc[root@alicegrid4 root]# vim /etc/aii-shellfe.conf

## aii-shellfe.conf configuration file## URL where XML profiles are availablecdburl = http://192.168.1.10/profiles

● The management of the AII was done with the aii-shellfe utility, which was configured in order to know where the XML installation profiles are available:

● The AII component uses a couple of templates in order to generate the PXE and KickStart configuration files. For Scientific Linux they should be:/usr/lib/aii/nbp/sl303_pxe.conf/usr/lib/aii/osinstall/sl303_ks.conf

● but the former is missing, so we’ve created and adapted the missing template modifying the rh73_pxe.conf which was in /usr/lib/aii/osinstall/

Page 28: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 28

Installation Example: AII (Automated Installation Infrastructure) setup

[root@alicegrid4 root]# vim /usr/lib/aii/nbp/sl303_pxe.conf

## sl303_pxe.conf PXElinux configuration file## This is a PXElinux configuration file template for SLC 3.0.3 installation# via HTTP used by the aii-nbp utility of the# Automated Installation Infrastructure, quattor toolkit (quattor.org)

# ramdisk kernel option is needed to download packages via HTTP# ksdevice kernel option is for avoiding to be asked which card# should be used for installation

default </software/components/aii/nbp/options/label> label </software/components/aii/nbp/options/label> kernel </software/components/aii/nbp/options/kernel> append ksdevice=eth0 ramdisk=32768 initrd=</software/components/aii/nbp/options/initrd> ks=http://</software/components/aii/osinstall/options/server>/ks/</system/network/hostname>.ks

Page 29: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 29

Installation Example: AII (Automated Installation Infrastructure) setup

[root@alicegrid4 root]# vim /usr/lib/aii/osinstall/sl303_ks.conf# only changes to the default template are shown...# Installation typeurl --url http://</software/components/aii/osinstall/options/server>/sl3...# TODO: This section will be reorganized after the new template# generator is available.# In the meanwhile, you must provide a KS configuration file for# each partition schema used.clearpart --allpart / --size 1024 --ondisk hda --fstype ext3part swap --size 2048 --ondisk hda --fstype swappart /usr --size 5120 --ondisk hda --fstype ext3part /tmp --size 1024 --ondisk hda --fstype ext3part /var --size 1024 --ondisk hda --fstype ext3part /home --size 5120 --ondisk hda --fstype ext3 --grow...# Packages groups/list%packages --resolvedeps --ignoremissingopensshopenssh-serverwgetperl-libnet # rh73perl-MIME-Base64 # rh73perl-URIperl-Digest-MD5 # rh73perl-libwww-perlperl-XML-Parser...

● We also adapted the KickStart template:

Page 30: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 30

Installation Example: Configuration Profile Templates

[alicegrid4] /home/fminafra > mkdir my_pan_tpl

[alicegrid4] /home/fminafra > cp -r /usr/share/doc/pan-templates-1.1.4/standard my_pan_tpl[alicegrid4] /home/fminafra > cp -r /usr/share/doc/pan-templates-1.1.4/site_specific my_pan_tpl[alicegrid4] /home/fminafra > cp -r /usr/share/doc/pan-templates-1.1.4/components my_pan_tpl

● We made a working copy of the example templates that we find with the quattor distribution:

● The templates under the standard directory were left untouched.

● The templates under the site-specific directory were adapted to our site:[alicegrid4] /home/fminafra/my_pan_tpl/site_specific > lltotal 296-r--r--r-- 1 fminafra fminafra 1100 Feb 23 10:13 profile_alien9.tpl-r--r--r-- 1 fminafra fminafra 645 Feb 17 15:41 pro_hardware_card_nic_intel_e1000.tpl-r--r--r-- 1 fminafra fminafra 648 Feb 17 15:41 pro_hardware_cpu_GenuineIntel_Xeon_2660.tpl-r--r--r-- 1 fminafra fminafra 704 Feb 17 15:34 pro_hardware_harddisk_STD_80.tpl-r--r--r-- 1 fminafra fminafra 524 Feb 17 15:11 pro_hardware_ram_512.tpl-r--r--r-- 1 fminafra fminafra 1251 Feb 19 18:06 pro_hardware_supermicro.tpl-r--r--r-- 1 fminafra fminafra 1294 Feb 22 11:28 pro_software_alicebari_linux_1_1_0.tpl-r--r--r-- 1 fminafra fminafra 1553 Feb 22 11:36 pro_software_packages_quattor_sl.tpl-rw-r--r-- 1 fminafra fminafra 44844 Feb 22 11:34 pro_software_packages_scientificlinux_3_03.tpl-r--r--r-- 1 fminafra fminafra 3897 Feb 22 16:44 pro_system_alicebari.tpl-r--r--r-- 1 fminafra fminafra 2979 Feb 23 10:43 pro_system_base.tpl-rw-rw-r-- 1 fminafra fminafra 209310 Feb 23 16:40 repository_alicegrid_bari_i386_sl3.tpl

Page 31: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 31

Installation Example: Configuration Profile Templates

● The templates under the components directory were left untouched apart from pro_software_component_aii.tpl

● We generated automatically the repository template with:

[alicegrid4] /home/fminafra > swrep-client template i386_sl3 >repository_alicegrid_bari_i386_sl3.tpl

● we choosen a name that 'matches' the template name, as found in the heading of that file:

[alicegrid4] /home/fminafra/my_pan_tpl/site_specific > head -30 repository_alicegrid_bari_i386_sl3.tpl# SWRep inventory for i386_sl3# This is an automatically generated template.# DO NOT EDIT.structure template repository_alicegrid_bari_i386_sl3;"name" = "alicegrid bari";"owner" = "[email protected]";"protocols" = list(nlist("name","http","url","http://192.168.1.10/swrep/i386_sl3/"),);"contents" = nlist(escape("4Suite-0.11.1-14-i386"),nlist("name","4Suite","version","0.11.1-14","arch","i386"),...

Page 32: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 32

Installation Example: Configuration Profile Templates● The template pro_software_packages_scientificlinux_3_03.tpl

was generated from a node that was already installed and configured (manually) by using the following command:

[pcfluent] /home/fminafra > rpm -qa --queryformat '["/software/packages"=pkg_add("%{NAME}","%{VERSION}-%{RELEASE}","%{ARCH}");\n]' > pro_software_packages_scientificlinux_3_03.tpl

● the file so obtained was 'headed' with the following lines containing the template name, as shown in the example template:

######################################################################## Scientific Linux 303 package list#######################################################################template pro_software_packages_scientificlinux_3_03;

● Mind that the template names should always match the filenames (apart from the .tpl suffix)

● The other templates are almost self explanatory (see the quattor manual and the bari_tpl.tgz)

Page 33: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 33

Installation Example: Configuration Profile Templates

● pro_software_packages_quattor_sl.tpl (quattor packages for SLC 3.0.3 clients)

● pro_software_alicebari_linux_1_1_0.tpl (higher level template that comprises the previous two)

● pro_system_base.tpl (per site settings)

● pro_system_alicebari.tpl (per cluster settings)

● profile_alien9.tpl (per node settings)

● each of these is a step down in a hierarchy

Page 34: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 34

Installation Example: Configuration Profile Templates

[alicegrid4] /home/fminafra/my_pan_tpl/site_specific > cat pro_system_base.tpl... # network "/system/network/domainname" = default( "netcic.cic" ); "/system/network/nameserver" = default(list( "192.135.10.4", "192.135.10.18", "131.154.1.3"));# "/system/network/timeserver" = default(list(# "ntp1.ien.it")); "/system/network/interfaces" = set_interface_defaults(nlist( "netmask", "255.255.255.0", "broadcast","192.168.1.255", "gateway", "192.168.1.10")); # mail address of the overall responsible admin "/system/rootmail" = default( "[email protected]" );

...

Page 35: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 35

Installation Example: Configuration Profile Templates[alicegrid4] /home/fminafra/my_pan_tpl/site_specific > cat pro_system_alicebari.tpl... # cluster wide default settings # "/system/cluster/name" = default( "alicegrid bari" ); "/system/cluster/type" = default( "interactive" ); "/system/state" = default( "production" ); "/system/siterelease" = default( "Scientific Linux 3.0.3" ); # software configuration include pro_software_alicebari_linux_1_1_0; # kernel version# "/system/kernel/version" = "2.4.21-27.0.2.EL.cernsmp";# specified in the node profile template # # partition information (for now these info are unused! # change those in the kickstart config template) # "/system/filesystems" = set_partitions( nlist( "hda1", nlist("size",1*GB, "type","ext3", "mountpoint","/"), "hda2", nlist("size",5*GB, "type","ext3", "mountpoint","/usr"), "hda3", nlist("size",2*gb, "type","swap", "mountpoint","none"), "hda4", nlist("size",70*GB, "type","extended", "logical_partitions",nlist( "hda5", nlist("size",1*GB, "type","ext3", "mountpoint","/tmp"), "hda6", nlist("size",1*GB, "type","ext3", "mountpoint","/var"), "hda7", nlist("size",16*GB, "type","ext3", "mountpoint","/home"))) )); # ...

Page 36: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 36

Installation Example: Configuration Profile Templates[alicegrid4] /home/fminafra/my_pan_tpl/site_specific > cat profile_alien9.tplobject template profile_alien9; # # structure of the profile # include pro_declaration_profile_base; # # hardware information # include pro_hardware_supermicro; "/hardware/cards/nic/eth0/hwaddr" = "00:30:48:73:A9:5D"; "/hardware/cards/nic/eth1/hwaddr" = "00:30:48:73:A9:5C"; # # network settings # "/system/network/hostname" = "alien9"; "/system/network/interfaces/eth0/ip" = "192.168.1.9"; "/system/network/interfaces/eth0/netmask" = "255.255.255.0"; "/system/network/interfaces/eth0/gateway" = "192.168.1.10"; # # the rest # include pro_system_base; include pro_system_alicebari;

"/hardware/serialnumber" = "0009"; "/system/kernel/version" = "2.4.21-27.0.2.EL.cernsmp";

Page 37: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 37

Installation Example: Configuration Profile Templates

[alicegrid4] /home/fminafra/my_pan_tpl/components > cat pro_software_component_aii.tpltemplate pro_software_component_aii;include pro_declaration_component_aii;# NBP configuration# default directory defined in /etc/aii-nbp.conf"/software/components/aii/nbp/template" = "sl303_pxe.conf";"/software/components/aii/nbp/options" = nlist ( "label" , "Scientific Linux CERN 3.0.3", # label for boot loader "kernel", "sl3/vmlinuz", # kernel relative path "initrd", "sl3/initrd.img" # ram disk relative path);# OS Installaler configuration# default directory defined in /etc/aii-osinstall.conf"/software/components/aii/osinstall/template" = "sl303_ks.conf";"/software/components/aii/osinstall/options" = nlist ( "server", "192.168.1.10", # OS installation server "cdb", "192.168.1.10", # CDB server "lang", "en_US", # language during installation "langsupp", "en_US", # language installed "keyboard", "it", # keyboard layout "mouse", "genericwheelps/2 --device psaux", # mouse type "timezone", "Europe/Rome", # time zone "rootpw", "$1$KD1Ab8VL$MMSNOwUXh6Cw8il6ckB7c1", # encrypted root passworkd (eg. aii) "auth", "--enableshadow --enablemd5", # options for authentication "firewall", "--disabled" # firewall);# standard component settings"/software/components/aii/active" = false; # aii is no real component, thus"/software/components/aii/dispatch" = false; # it is not active

● from the components directory the only template that we modified was pro_software_component_aii.tpl

Page 38: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 38

Installation Example: Configuration Profile Templates

● The root password in the previous template is an MD5-hashed password. You can find a password's encrypted form in any number of ways,such as copying an existing entry from /etc/shadow or using the OpenSSL's passwd module:

[alicegrid4] /home/fminafra > openssl passwd -1 MySecret$1$sXiKzkus$haDZ9JpVrRHBznY5OxB82

● (the'1' stays for the MD5-based BSD password algorithm)

Page 39: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 39

Installation Example: Loading of Templates in CDB● Now that all the templates are tailored for our site we can add

them to the CDB:[alicegrid4] /home > chmod go+rx fminafra[alicegrid4] /home > lltotal 28drwx--x--x 5 cdb cdb 4096 Feb 9 16:41 cdbdrwxr-xr-x 10 fminafra fminafra 4096 Feb 19 17:25 fminafradrwx------ 2 root root 16384 Jan 7 18:45 lost+founddrwxr-xr-x 3 swrep swrep 4096 Feb 9 17:35 swrep

[alicegrid4] /home > su - cdbPassword:[alicegrid4] /home/cdb > cd /home/fminafra/my_pan_tpl/standard/

[alicegrid4] /home/fminafra/my_pan_tpl/standard > cdb-simple-cli --add *

[alicegrid4] /home/fminafra/my_pan_tpl/standard > cd /home/fminafra/my_pan_tpl/components

[alicegrid4] /home/fminafra/my_pan_tpl/components > cdb-simple-cli --add *

[alicegrid4] /home/fminafra/my_pan_tpl/components > cd /home/fminafra/my_pan_tpl/site-specific

[alicegrid4] /home/fminafra/my_pan_tpl/site-specific > cdb-simple-cli --add *

[alicegrid4] /home/cdb > cdb-simple-cli --list[INFO] listing templates...pro_declaration_component_accountspro_declaration_component_aii...pro_system_baseprofile_alien9repository_alicegrid_bari_i386_sl3

Page 40: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 40

Installation Example: Automated Configuration/Installation

● Finally we use the aii-shellfe utility to configure nodes and schedule them for installation:

[root@alicegrid4 root]# aii-shellfe --configure alien9[WARN] aii-shellfe: invalid hostname (alien9) for operation 'configure'[INFO] aii-shellfe: changes: 0 added 0 removed 0 hdboot 0

● (This error was due to the fact that there is no name server capable of resolving the 'alien9' hostname. So we added an entry in /etc/hosts)[root@alicegrid4 root]# vim /etc/hosts

# Do not remove the following line, or various programs# that require network functionality will fail.127.0.0.1 localhost.localdomain localhost193.206.185.207 alicegrid4.ba.infn.it

192.168.1.9 alien9 alien9

Page 41: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 41

Installation Example: Automated Configuration/Installation

[root@alicegrid4 root]# aii-shellfe --status .+[INFO] aii-nb: alien9 status: undefined[INFO] aii-shellfe: changes: 0 added 0 removed 0 hdboot 0 install

[root@alicegrid4 root]# aii-shellfe --configure alien9Use of uninitialized value in concatenation (.) or string at /usr/lib/perl/EDG/WP4/CCM/CCfg.pm line 163.Use of uninitialized value in concatenation (.) or string at /usr/lib/perl/EDG/WP4/CCM/CCfg.pm line 163.Use of uninitialized value in concatenation (.) or string at /usr/lib/perl/EDG/WP4/CCM/CCfg.pm line 163.[INFO] aii-shellfe: changes: 1 added 0 removed 0 hdboot 0 install[root@alicegrid4 root]# aii-shellfe --configure alien9[WARN] aii-shellfe: invalid hostname (alien9) for operation 'configure'[INFO] aii-shellfe: changes: 0 added 0 removed 0 hdboot 0

● (The 3 errors due to a bug fixed in the forthcoming release...)

Page 42: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 42

Installation Example: Automated Configuration/Installation

● After having configured the node, we mark it for installation at next reboot:

[root@alicegrid4 root]# aii-shellfe --install alien9[INFO] aii-shellfe: changes: 0 added 0 removed 0 hdboot 1 install

● Finally, when it has been successfully installed it appears as follows:

[root@alicegrid4 root]# aii-shellfe --status .+[INFO] aii-nb: alien9 status: boot[INFO] aii-shellfe: changes: 0 added 0 removed 0 hdboot 0 install

● and this means that the node now boots from its own disk.

Page 43: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 43

TODO

● This was only a test for evaluating the behaviour of this tool

● Preparation of templates for a production system and for at least two kind of machines (Disk Server, Worker Node, but also: PBS-server, gateway)

● Consolidating skills in its use may save time in future

● New releases expected, with improvements

Page 44: Experiences with QUATTOR  in the   upgrade of the Bari’s Farm

Francesco Minafra

ALICE offline week - CERN, 24 February 2005 44

REFERENCES

● http://quattor.web.cern.ch/quattor/

● http://linux.web.cern.ch/linux/scientific3/docs/

● https://svn.lal.in2p3.fr/LCG/QWG/web/index.html

[email protected]

● http://savannah.cern.ch/projects/elfms/