14
02 August 2004 1 OraMonPlans 08/04

02 August 20041 OraMonPlans 08/04. 02 August 20042 Topics Enhancements –OraMon DB redundancy layer –Compare and fix OraMon configurations –Expiry of historical

Embed Size (px)

Citation preview

Page 1: 02 August 20041 OraMonPlans 08/04. 02 August 20042 Topics Enhancements –OraMon DB redundancy layer –Compare and fix OraMon configurations –Expiry of historical

02 August 2004 1

OraMonPlans 08/04

Page 2: 02 August 20041 OraMonPlans 08/04. 02 August 20042 Topics Enhancements –OraMon DB redundancy layer –Compare and fix OraMon configurations –Expiry of historical

02 August 2004 2

Topics

Enhancements– OraMon DB redundancy layer– Compare and fix OraMon configurations– Expiry of historical data– Saving disk space

OraMonArchBugsOthers

OraMon OO development with TogetherOraMon changes for Maciej’s alarm interfacing system?

Page 3: 02 August 20041 OraMonPlans 08/04. 02 August 20042 Topics Enhancements –OraMon DB redundancy layer –Compare and fix OraMon configurations –Expiry of historical

02 August 2004 3

OraMon DB redundancy layerRequirements:

1. OraMon should retry connect after loosing DB connectionCurrently (as for OraMon 0.0.3), upon DB connection failure, OraMon

issues a [FATAL] log and stops2. OraMon should support ‘Do(Not)InsertSamples’ command

Currently, OraMon inserts or does not insert samples, according to the value of environment variable MR_READONLY

3. OraMon should have a ‘HeartBeat’ commandCurrently, one may check if an OraMon instance is alive by issuing a MR

API query to it (via lemon-utils/lemon-cli.pl).

Attitudes to satisfy ‘retry connect’ and ‘Do(Not)InsertSamples’:– ‘External’: (do some variable setting and) start Oramon

Pros: Simple to implement, no internal changes to OraMon Con: Few minutes down time

– ‘Internal’: Change OraMon to satisfy requirement by adding specific codePros and cons are the opposite compared to ‘External’

Page 4: 02 August 20041 OraMonPlans 08/04. 02 August 20042 Topics Enhancements –OraMon DB redundancy layer –Compare and fix OraMon configurations –Expiry of historical

02 August 2004 4

OraMon DB redundancy layerRequirements:

1. OraMon should retry connect after loosing DB connectionCurrently (as for OraMon 0.0.3), upon DB connection failure, OraMon

issues a [FATAL] log (+ failure kind) and stops2. OraMon should support ‘Do(Not)InsertSamples’ command

Currently, OraMon inserts or does not insert samples, according to the value of environment variable MR_READONLY

3. OraMon should have a ‘HeartBeat’ commandCurrently, one may check if an OraMon instance is alive by issuing a MR

API query to it (via lemon-utils/lemon-cli.pl).

Attitudes to satisfy ‘retry connect’ and ‘Do(Not)InsertSamples’:– ‘External’: (do some variable setting and) start Oramon

Pros: Simple to implement, no internal changes to OraMon Con: Few minutes down time

– ‘Internal’: Change OraMon to satisfy requirement by adding specific codePros and cons are the opposite compared to ‘External’

Page 5: 02 August 20041 OraMonPlans 08/04. 02 August 20042 Topics Enhancements –OraMon DB redundancy layer –Compare and fix OraMon configurations –Expiry of historical

02 August 2004 5

OraMon DB redundancy layer‘External’ solutions:

1. Retry connect after loosing DB connectionA simple (restart-oramon like) service that issues:

/etc/rc.d/init.d/OraMon start

after OraMon stops, if ‘failure kind’ belongs to a TBD failure set.

2. ‘InsertSamples’ command to OraMonrestart OraMon after un/set MR_READONLY: • Do insert: unset MR_READONLY ; /etc/rc.d/init.d/OraMon restart• Do not insert: set MR_READONLY=yes ; /etc/rc.d/init.d/OraMon restart

3. OraMon ‘HeartBeat’Check sane response to a lemon-cli.pl queryShould not get: Failed to MRs_getSamples() : #-1 : Connection refusedExample: perl lemon-utils/lemon-cli.pl --metrics="10002" --nodes="lcgmon002d«

--remote-server="http://ccs002d:12510"

Page 6: 02 August 20041 OraMonPlans 08/04. 02 August 20042 Topics Enhancements –OraMon DB redundancy layer –Compare and fix OraMon configurations –Expiry of historical

02 August 2004 6

OraMon DB redundancy layer‘Internal’ solutions:

1. Retry connect after loosing DB connectionChange OraMon code: when an SQL command fails, because of a

TBD failure set, do not fail, but rather try to connect again first (for a few times, sleeping between each try)

2. ‘InsertSamples’ command to OraMonReuse and extend existing proprietary ‘insert samples’ protocol: • Define ‘pseudo’ metricId (set) that OraMon interprets as

commands rather than as metrics to be inserted• Commands arrive from a specific port or from samples port. • Commands may be added to ‘metrics configuration’ (like)

configuration

3. OraMon ‘HeartBeat’: the same as previous

Page 7: 02 August 20041 OraMonPlans 08/04. 02 August 20042 Topics Enhancements –OraMon DB redundancy layer –Compare and fix OraMon configurations –Expiry of historical

02 August 2004 7

Changing metrics configuration

Related OraMon documentation: Changing metrics configuration

German’s email 19/7 [Lemon] changes in metric data fields:

- changes (adding/removing/changing data fields) to latestOnly metrics: okDavid: - ok.

- When applying a new configuration, all (TBD changed) latest tables and views will be automatically dropped

- changes to latestOnly metrics which have a historical table defined, but not (anylonger) used (reconfigured from 'latestOnly=false' to true): drop historical table altogether.David: - ok.

- Also, drop tables of removed metrics?(- Also, is Archiving of tables to be dropped required?)

Page 8: 02 August 20041 OraMonPlans 08/04. 02 August 20042 Topics Enhancements –OraMon DB redundancy layer –Compare and fix OraMon configurations –Expiry of historical

02 August 2004 8

Changing metrics configuration Cont.

- changes to 'historical' metrics (not latestOnly):  - added data fields: OKDavid: TBD: ok iff adding fields does not complicate restoring of old

data that do not have new fields- removed and changed data fields: drop historical values in DB, or

refuse (global OraMon configuration Boolean parameter).David: I doubt that dropping historical data will satisfy potential

problems while restoring older data. Assuming this is correct, ‘refuse’ will always be applied.

- changes where historical values should be preserved: define a new metric ID. I don't think any conversion magic is appropriate, and for being consistent, it should be applied as well to all historical data already archived into CASTOR, which is far from trivial.David: As a rule of thumb: I suggest to avoid applying changes to

archived data

Page 9: 02 August 20041 OraMonPlans 08/04. 02 August 20042 Topics Enhancements –OraMon DB redundancy layer –Compare and fix OraMon configurations –Expiry of historical

02 August 2004 9

Changing metrics configurationDavid’s suggestions

- Observation: The OraMon level of complexity to add a field is similar to that of applying other ‘compatible’ changes: remove field, change length

- In order to avoid clashes between existing OraMon data schemas and previously archived data, I suggest that:- Each change to a metricClass will have new metricIds- Previous metricIds will be marked ‘obsolete’, by new metadata field- Previous metricIds may have a ‘replaced by metricId’ metadata field

- In order to preserve older data and allow data schema changes, I suggest that when a ‘compatible’ change is applied to a metricClass, its existing historical table will be renamed to the new name, and automatic fixes will be applied by OraMon.

Page 10: 02 August 20041 OraMonPlans 08/04. 02 August 20042 Topics Enhancements –OraMon DB redundancy layer –Compare and fix OraMon configurations –Expiry of historical

02 August 2004 10

Expiry of historical data

4162 expiry of historical data  To be discussed at CERN 2004-Jul-19 12:14

jveldik

Page 11: 02 August 20041 OraMonPlans 08/04. 02 August 20042 Topics Enhancements –OraMon DB redundancy layer –Compare and fix OraMon configurations –Expiry of historical

02 August 2004 11

Saving disk space

Compress partitions – Howto: OraMon partitions thread to compress

partitions that are at least one day old– TBD: May cause unexpected complications– Saving space is important, but not urgent

Make numbers (and strings) smaller– May be applied after applying all ‘Changing

metrics configuration’ items  

Page 12: 02 August 20041 OraMonPlans 08/04. 02 August 20042 Topics Enhancements –OraMon DB redundancy layer –Compare and fix OraMon configurations –Expiry of historical

02 August 2004 12

OraMonArch

OraMonArch documentation• If ‘archive and not drop’ is required,

implementation should be enhanced, since current implementation drops and returns data

• Two OraMonArch instances: continuous and non-continuous:Non continuous requests can not be queued

• OraMonArch transaction error when stop/crash after DDL command and before updating relevant checkpoint

Page 13: 02 August 20041 OraMonPlans 08/04. 02 August 20042 Topics Enhancements –OraMon DB redundancy layer –Compare and fix OraMon configurations –Expiry of historical

02 August 2004 13

bug reportsItem ID Summary Submitted on Submitted b

y

4000 OraMon packaging issues, broken restart-oramon Minor: understand a minor rpm mistake: restart-oramon is installed by OraMon non config rpm

2004-Jul-05 07:33 gcancio

4001 LSB compliance for OraMon  Minor 2004-Jul-05 07:40 gcancio

4002OraMon should continue running with old metadata if incompatibility is found Medium: See: Compare and fix OraMon configurations

2004-Jul-05 07:58 Gcancio

4004 Floating point exception error using OraMonAdmin  Small: Fix a bug 2004-Jul-05 08:35 gcancio

4015

define/document policy for valid / invalid configuration changesSmall: OraMon should also check for valid characters and keywords for eg. metric field

descriptions. This should be part of the documentation as well.Add:OraMon and/or the script that creates metrics configuration may be enhanced to check

against using Oracle reserved words as identifiers. http://www-rohan.sdsu.edu/doc/oracle/server803/A54661_01/ares.htm

Make sure that OraMon will not fail with fieldNames that consist more than one word + strange chars (see email from 19/7)

2004-Jul-05 12:43 gcancio

4074 OraMon - Validation Failures Minor 2004-Jul-08 10:22 waldron

4097 Add OraMon possible errors to its documentation  Small 2004-Jul-12 12:28 dfront

4162 expiry of historical data  To be discussed at CERN 2004-Jul-19 12:14 jveldik

4180OraMon should support number sizes and a boolean typeSmall. Add: Learn if OraMon and agent can use the same code for metric validation.

2004-Jul-21 05:58 dfront

Page 14: 02 August 20041 OraMonPlans 08/04. 02 August 20042 Topics Enhancements –OraMon DB redundancy layer –Compare and fix OraMon configurations –Expiry of historical

02 August 2004 14

Bugs found while installing OraMon 0.0.3

1) OraMon views indicate time that is later by one hour than the real time

2) OraMonArch/Cont service script (/etc/rc.d/init.d/OraMonArchContCtl): Return only after completing the work. Should return immediately. May cause computer to stuck at reboot.

3) Probable problem: metric validation errors at lcgmon002d differ from those at ccs002d

4) To be addressed to German: recognizing metric configuration change according to date causes rpm update to fail by mistake. Suggested fix: A hard coded date attribute.

5) To be checked: I suspect that logrotate does not work at ccs002d for /var/log/OraMon.log, because it did grow to: 66M as for 27/7

6) OraMonArch transaction error when stop.crash after DDL command and before updating relevant checkpoint (See above)