Upload
ally-moberly
View
224
Download
1
Tags:
Embed Size (px)
Citation preview
MIDDLEWARE RENOVATION – TECHNICAL OVERVIEW AND
PLANS FOR MIGRATION
CMW@GSI25TH APRIL 2013
Wojciech Sliwinski BE-CO-INfor the Middleware team:
Felix Ehm, Kris Kostro, Joel Lauener, Radoslaw Orecki, Ilia Yastrebov, [Andrzej Dworak]
Special thanks to: Vito Baggiolini and Pierre Charrue
Wojciech Sliwinski, Middleware Renovation 2
Agenda
Context & Motivation for Renovation
Middleware Review process
Technical evaluation of the transport layer
Changes in the MW Architecture in LS1
MW Upgrade milestones in 2013
Risk assessment and mitigation
Conclusions
25th April 2013
Wojciech Sliwinski, Middleware Renovation 3
Agenda
Context & Motivation for Renovation
25th April 2013
Wojciech Sliwinski, Middleware Renovation 4
MW Mandate & Scope
Standard set of MW solutions Centrally managed services Track & optimize runtime parameters Well defined feedback channel for users Provide support & follow-up issues
Scope: CERN Accelerator Complex Operational 24*7*365 Must be Reliable & High Quality 73’000 HW devices, 3’150 servers In all Eqp. groups (4 dpts: BE, EN, GS, TE)
GUI Applications
Control Logic
Middleware
Control System
25th April 2013
Wojciech Sliwinski, Middleware Renovation 5
CMW in the Controls System
LHC MACHINE
GENERAL PURPOSE
NETWORK
OPERATORCONSOLES
OPERATORCONSOLES
FIXEDDISPLAYS
CE
RN
G
IGA
BIT
E
TH
ER
NE
T
TE
CH
NIC
AL
NE
TW
OR
K
FILE SERVERS APPLICATION SERVERSSCADA SERVERS
RT Lynx/OSVME FRONT ENDS
WORLDFIPFront Ends PLCs
ACTUATORS AND SENSORSCRYOGENICS, VACUUM, ETC…
QUENCH PROTECTION AGENTS,POWER CONVERTERS FUNCTIONSGENERATORS, CRYO TEMPERATURE SENSORS…
BEAM POSITION MONITORS,BEAM LOSS MONITORS,BEAM INTERLOCKS,RF SYSTEMS, ETC…
Wor
ldF
IPS
EG
ME
NT
(1,
2.5
MB
its/s
ec)
PR
OF
IBU
S
FIP
/IO
TCP/IP communication services
OP
TIC
AL
FIB
ER
S
TCP/IP communication services
TCP/IP communication services
TIMING GENERATION
TT
T
T
TT
T T T T
DIR
EC
T I/
O
PR
ES
EN
TA
TIO
N T
IER
MID
DL
E T
IER
R
ES
OU
RC
E T
IER
CMW client (C++/Java)JAPC
GUIs, LabView, RADE
CMW client (Java)JAPC
Logging, LSA, InCA, SIS
CMW client/server (C++/Java)Proxy, DIP, AlarmMon, AQ
CMW server (C++)FESA, FGC, GM
CMW server (C++)PVSS (Cryo, Vacuum)
JMS client (Java)GUIs
JMS client (Java)Servers: Logging, InCA, SIS
25th April 2013
Wojciech Sliwinski, Middleware Renovation 6
Motivations for MW Renovation Current CORBA-based CMW-RDA
Integrated in the Control system Used to operate all CERN accelerators Provides widely accepted Device/Property model > 10 years old
Why to review & upgrade MW ? CORBA was choosen 15 years ago Technical limitations of CORBA-based transport Functional limitations of the current CMW-RDA Codebase with long history difficult to maintain, needs architecture review Major issue of long-term support & future evolution Evolution of technology over last 10 years: HW, OS, middleware, 3rd party libraries Human factor less & less CORBA expertise on the market
25th April 2013
Wojciech Sliwinski, Middleware Renovation 7
Technical limitations of CORBA transport Became legacy, not actively supported maintenance issue
Shrinking community, slow response time omniORB (C++) – 1 developer/maintainer, last release mid-2011 JacORB (Java) – few developers, small community
Major technical limitations Lack of fully asynchronous processing channel Blocking communication infamous JacORB blocking issue Lack of low-level control of IO resources (sockets, request queues)
Development issues Difficult to extend the wire protocol Backward compatibility issue Complex, error prone API Heavy in memory usage
25th April 2013
Wojciech Sliwinski, Middleware Renovation 8
Summary: Why change CORBA?
CORBA was choosen 15 years ago Not actively maintained big risk for the MW project Better solutions exist on the market Invest in future solution rather than maintaining old one
25th April 2013
Wojciech Sliwinski, Middleware Renovation 9
Functional limitations of CMW-RDA Several pending operational issues
Difficult (or hardly possible) to resolve with current library Any major change very difficult to introduce
○ Technical Stops & Xmas breaks too short for massive deployment○ High risk Major impact on front-end frameworks and applications
No protection against ’slow/bad’ client applications Misbehaving application may destabilise front-end server Affects reliability of the subscription channel Workaround: introduction of Proxy
Poor scalability when many clients subscribed Stability issues observed when >200 clients subscribed (even for Proxy) Threading model doesn’t scale well with many clients
Missing support for priority clients (e.g. SIS, PM, InCA, Logging) Non-critical clients (e.g. GUIs) have the same communication priority
+ others …25th April 2013
Wojciech Sliwinski, Middleware Renovation 10
Summary: Why change CMW-RDA?
With current CORBA-based middleware we can’t solve the pending operational issues
We can’t provide better scalability & reliability CMW-RDA is difficult to evolve & extend
25th April 2013
Wojciech Sliwinski, Middleware Renovation 11
Agenda
Middleware Review process
25th April 2013
Wojciech Sliwinski, Middleware Renovation 12
Middleware Renovation process MW Renovation = MW Review + MW Upgrade
MW Review aims to provide the most appropriate technical solution satisfying the user requirements
MW Upgrade establishes the plan & strategy for introduction of the new MW Objective: LS1 the unique opportunity for the major MW upgrade
Middleware Review Process Gathering of users feedback and requirements (2010-11) Review of communication and serialization libraries (2011-12) Prototyping using selected communication products (2012) Design & impl. of new RDA3: Data, Client & Server (2012-13) Testing & validation of core MW infrastructure (summer’13) Upgrade of all dependent MW libraries & services (2013-14)
○ JAPC, Directory Service, Proxy, DIP Gateway
25th April 2013
Wojciech Sliwinski, Middleware Renovation 13
Review of users requirements 2010-11 – series of interviews with major users
Lars Jensen, Stephen Jackson (BI) Andy Butterworth, Frode Weierud, Roman Sorokoletov (RF) Brice Copy, Clara Gaspar (DIP, DIM) Frederic Bernard, Herve Milcent, Alexander Egorov (PVSS) Alexey Dubrovskiy (CTF), Kris Kostro (DIP gateways) Marine Gourber-Pace, Nicolas Hoibian (Logging) Nicolas De Metz-Noblat (Front-Ends), Alastair Bland (Infrastructure) Michel Arruat (FESA), Stephen Page (FGC) Niall Stapley, Mark Buttner, Marek Misiowiec (LASER & DIAMON) Nicolas Magnin, Christophe Chanavat (ABT) Stephane Deghaye, Jakub Wozniak (InCA, SIS) Vito Baggiolini, Roman Gorbonosov (JAPC & DA systems) + regular feedback from OP + internal team input
http://wikis/display/MW/Interviews+with+Experts
25th April 2013
Wojciech Sliwinski, Middleware Renovation 14
New RDA3: Accepted requirements General
Java & C++ API, Win (64-bit) & Linux (SLC5 32-bit & SLC6 64-bit)
Accelerator Device Model (i.e. Device/Property) Get, Set, Async-Get, Async-Set, Subscribe Early detection of communication failures Improve error reporting in all the layers: client, server, gateways Admin interface & runtime diagnostics & statistics
Data support Data object: primitives, n-dim arrays, data structures
Subscription mechanism Subscription behaviour the same regardless condition of the server (active, down) Several client subscription policies (default: continuous) Provide subscription notification ordering First-Update enforced via CMW on server-side
○ Provide callback to front-end framework for the server-side Get Drop support for on-change flag Standardise use of subscription filters and update flags (e.g. immediate update) Add header for acquired Data common metadata (e.g. acq. stamp, cycle name) All loss of data (dropped updates) must be notified to clients25th April 2013
New requirement
Wojciech Sliwinski, Middleware Renovation 15
New RDA3: Accepted requirements Client side
RDA3 client API connects with both: RDA2 (old) & RDA3 (new) servers Efficient mechanism for: connection, disconnection & reconnection Must be able to recover from any interruption of communication with the server
○ Server restarts, IP address change, rename/move of a device to another server Improved semantics of Array Calls, i.e. handling of individual parameters Enhanced diagnostics & collection of statistics
Server side Policies for discarding notifications, i.e. deal with overflows and ’bad clients’
○ Instrument with counters & timings allowing to diagnose the notifications delivery Prioritisation of Get/Set requests for high-priority clients Server-side subscription tree fully managed by CMW
○ Server does not need to manage client subscriptions any more Manage the client connections, e.g. forced disconnect of a client Client lifetime callbacks (i.e. connected, disconnected)
25th April 2013
New requirement
Wojciech Sliwinski, Middleware Renovation 16
New RDA3: Accepted requirements Server side (cont.)
Client discovery for the diagnostics purposes (i.e. connected clients with payload) Enhanced diagnostics & collection of statistics
Ongoing discussions (not accepted yet) Prioritisation of subscription notifications for high-priority clients
Technical notes Invest in asynchronous & non-blocking communication Prefer 0-copy & lock-free data structures, message queues
http://wikis/display/MW/Design+of+New+RDA
25th April 2013
New requirement
Wojciech Sliwinski, Middleware Renovation 17
New RDA3: Summary of requirements
UnchangedDevice/Property modelSet of basic operations (Get, Set, Subscribe)
Fixes & improvementsSubscription mechanismConnection managementDiagnostics & statistics
New functionality Policies for subscription management (client & server)Client prioritiesServer-side subscription treeExtended Data supportStandardise First-Update concept
25th April 2013
Wojciech Sliwinski, Middleware Renovation 18
Agenda
Technical evaluation of thetransport layer
25th April 2013
Wojciech Sliwinski, Middleware Renovation 19
Middleware transport requirements
25th April 2013
Desirable
Mandatory
Fundamental
Lightweight
Friendly API, documentation
Request/reply & pub/sub patterns
Open source license
Asynchronous
Active community
Stability, Maturity & Longevity
Performance & Scalability
C++/Java
Linux/Windows
Over TCP/IP LAN
Wojciech Sliwinski, Middleware Renovation 20Andrzej Dworak, ICALEPCS 2011
Evaluation process –> our criteria
25th April 2013
CRITERIA
QoSResources,binary size,
memoryPerformance
Communicationspatterns
API, look & feel,documentation
Community,maturity
Appearance
• Creators• specification• documentation
• Users• forums• bug reports
• Internet
Simple usage
• Download• licensing
• Compile• Linux & gcc
• Run examples
Testing
• Communication patterns
• Performance• Exceptional
situations• QoS• Configuration
Wojciech Sliwinski, Middleware Renovation 21
Evaluated middleware products
25th April 2013
Ice
Thrift
omniORB
YAMI
OpenSpliceDDS
OpenAMQCoreDXRTI DDS
ZeroMQ
QPid
MQtt RSMBJacORB
Mosquito
All opinions are based only on our knowledge and evaluation. Each of the products, depending on the requirements, may constitute a good solution.
RabbitMQ
Andrzej Dworak, ICALEPCS 2011
Wojciech Sliwinski, Middleware Renovation 22
Products comparison (according to the criteria)
25th April 2013
Sync, async & msg patterns
QoS
Dependencies & memory f-p
Performance
Look & feel, API, docs
Community & maturity
Score
ZeroMQ 6Ice 5
YAMI4 4RTI 3
Qpid 3CORBA 2
Thrift 2
Andrzej Dworak, ICALEPCS 2011
Wojciech Sliwinski, Middleware Renovation 23
Conclusions Several good middleware solutions available The choice is dictated by the most critical requirements Not easy performance matters but also ease of use, community, … Prototyping was done with the most promising candidates:
ZeroMQ, Ice & YAMI
Finally we decided to choose ZeroMQ (http://www.zeromq.org/) Asynchronous & non-blocking communication 0-copy & lock-free data structures, message queues Nice API, good documentation & active community
25th April 2013
Wojciech Sliwinski, Middleware Renovation 24
New RDA3 Java – Sync Get round-trip time
25th April 2013
Test setup: 1kB message payload, cs-ccr-* machines, 1 server host & 10 client hosts
0
2
4
6
8
10
12
14
16
18
0 100 200 300 400 500 600 700 800 900 1000
Roun
d-tr
ip(m
s)
Number of clients
Syn Get round-trip (1kB message payload)
max
average
Wojciech Sliwinski, Middleware Renovation 25
New RDA3 Java – subscription notification latency
25th April 2013
Test setup: 1kB message payload, cs-ccr-* machines, 1 server host & 10 client hosts
0
50
100
150
200
250
0 100 200 300 400 500 600 700 800 900 1000
Late
ncy
(ms)
Number of clients
Subscription notification latency (1kB message payload)
min
max
average
Wojciech Sliwinski, Middleware Renovation 26
New RDA3 Java – subscription notification latency
25th April 2013
Test setup: 1kB message payload, cs-ccr-* machines, 1 server host & 10 client hosts
0
1
2
3
4
5
6
0 20 40 60 80 100 120 140 160 180 200
Late
ncy
(ms)
Number of clients
Subscription notification latency (a closer look)
min
max
average
Wojciech Sliwinski, Middleware Renovation 27
Agenda
Changes in the MW Architecture in LS1
25th April 2013
Wojciech Sliwinski, Middleware Renovation 28
Current MW Architecture
25th April 2013
User written
Middleware
Central services
Physical Devices (BI, BT, CRYO, COLL, QPS, PC, RF, VAC, …)
Java Control Programs
RDA Client API (C++/Java)Device/Property Model
DirectoryService
ConfigurationDatabase
CCDB
VB, Excel, LabView
ServersClients
Virtual Devices(Java)
PS-GMServer
FESAServer
FGCServer
PVSSGateway
C++ Programs
MoreServers
Administrationconsole
Passerelle C++
CMW InfrastructureCORBA-IIOP
RDA Server API (C++/Java)Device/Property Model
RBAC A1Service
DirectoryService
RBAC Service
JAPC API
CMW integr. CMW int. CMW int.CMW int.CMW int. CMW int.
Wojciech Sliwinski, Middleware Renovation 29
Changes in MW Architecture in LS1
25th April 2013
User written
Middleware
Central services
Physical Devices (BI, BT, CRYO, COLL, QPS, PC, RF, VAC, …)
Java Control Programs
RDA Client API (C++/Java)Device/Property Model
DirectoryService
ConfigurationDatabase
CCDB
VB, Excel, LabView
ServersClients
Virtual Devices(Java)
PS-GMServer
FESAServer
FGCServer
PVSSGateway
C++ Programs
MoreServers
Administrationconsole
Passerelle C++
CMW InfrastructureZeroMQ
RDA Server API (C++/Java)Device/Property Model
RBAC A1Service
DirectoryService
RBAC Service
JAPC API
CMW integr. CMW int. CMW int.CMW int.CMW int. CMW int.
Upgrade in LS1
Wojciech Sliwinski, Middleware Renovation 30
Agenda
MW Upgrade milestones in 2013
25th April 2013
Wojciech Sliwinski, Middleware Renovation 31
MW Upgrade Milestones in 2013Milestone Completed by ?
RDA3 Java (client/server) (alpha) June’13
RDA3 C++ server (alpha) July’13
RDA3 integration with: FESA, FGC, PVSS July-Oct’13
RDA3 C++/Java (client/server) validated September’13
New JAPC release with RDA3 Java September’13
RDA3 integration with: FESA, FGC, PVSS July-Oct’13
New FESA3.2 release with RDA3 December’13
25th April 2013
RDA3 C++ Integration with FESA, FGC, PVSS
RDA3 validatedNew JAPC New FESA3.2 Tests with eqp. End LS1
July’13 July-Oct’13 September’13 Winter’13/14 August’14December’13
End-of-Life for RDA2: LS2
Wojciech Sliwinski, Middleware Renovation 32
MW Upgrade strategy in LS1 and towards LS2
No BIG-BANG migration but gradual Backward compatible (connection-wise) new RDA3 client library
New RDA3 clients can communicate with RDA2 & RDA3 servers FESA3 will exist with both: old RDA2 (FESA3.1) and new RDA3 (FESA3.2)
25th April 2013
Old JAPC
Old RDA2server
FESA2.10 FESA3.1
Old RDA2server
New RDA3server
FESA3.2
Old RDA2client
New JAPC
New RDA3client
RDA2 RDA3 Gateway
Client apps will migrate during LS1
Only for justified, exceptional cases
FEC developers should migrate to
FESA3.2 ASAP
Wojciech Sliwinski, Middleware Renovation 33
LS1: Changes in JAPC
New major JAPC version upgrade for RDA3 (September’13) Public API backward compatible Possible API extensions, but always compatible Announcement via accsoft-java-announce list
Required Actions for JAPC Users Update JAPC jars (via CommonBuild) Re-release your product (via CommonBuild) New JAPC will support communication with RDA2 & RDA3 servers
25th April 2013
Wojciech Sliwinski, Middleware Renovation 34
LS1: Changes in RDA
New major version: RDA3 (June’13 – alpha version) Public API NOT backward compatible New protocol, new architecture, new design Same Device/Property model & Get/Set/Subscribe calls Announcement via cmw-news & accsoft-java-announce lists
Required Actions for RDA Users For Java: Use new version of JAPC (API unchanged) For Java: New JAPC will support communication with RDA2 & RDA3 servers For C++: Upgrade user code to new RDA3 API For C++: RDA3 will support communication with RDA2 & RDA3 servers
Consequences if NO Action staying with old RDA2 NOT possible to communicate with new RDA3 servers (FESA3, FGC, etc.)
25th April 2013
Wojciech Sliwinski, Middleware Renovation 35
Agenda
Risk assessment and mitigation
25th April 2013
Wojciech Sliwinski, Middleware Renovation 36
Risk assessment and mitigation
Risks Mitigation
Wrong product developed(wrong requirements)
Early and continuous involvement of clients & experts
Product is (too) late Careful planning and follow-up Fall-back to less ambitious goals
Product has bugs or incompatibilities
Early, continuous testing (unit and functional tests)
Bugs affect operations Gradual migration Fast deployment of bugfixes
25th April 2013
Wojciech Sliwinski, Middleware Renovation 37
Risk: Wrong product developed (wrong requirements)
25th April 2013
Mitigation: Early and continuous involvement of clients & experts
We involved clients and experts since 2010 Requirements review with all major clients Technical discussions with eqp. experts
Iterative development involving the Review team Design meetings (API and internals) since January 2013 Alpha versions will be available for feedback and validation several months
before the final release Feedback is continuously integrated in development (= iterative)
Wojciech Sliwinski, Middleware Renovation 38
Risk: Product is (too) late
25th April 2013
Mitigation: Careful planning and follow-up Fall-back to less ambitious goals
Planning prepared and followed by the MW team Taking into account needs and priorities of other CO projects and clients
Regular follow-up In CO internally by TEC coordinator In informal meetings with the MW experts (as done so far)
Fall-back to less ambitious goals Plan priorities of functionality Drop (postpone) work with lower priority
Wojciech Sliwinski, Middleware Renovation 39
Risk: Product has bugs or incompatibilities
25th April 2013
Mitigation: Early, continuous testing (unit, functional & integration tests)
Unit tests to asses quality inside the MW project Required dev. phase in the MW team
Functionality tests in CO Testbed Functionality of CMW only
Integration tests to check interoperability Integration with FESA in CO Testbed Integration with FGC in FGC Lab
Wojciech Sliwinski, Middleware Renovation 40
Risk: Bugs affect operations
25th April 2013
Mitigation: Gradual Migration (1)
No BIG-BANG migration but gradual Backward compatible (connection-wise) new RDA3 client library
New RDA3 clients can talk to old RDA2 servers FESA3 will exist with both: old RDA2 and new RDA3
Old JAPC
Old RDA2server
FESA2 FESA3
Old RDA2server
New RDA3server
FESA3
Old RDA2client
New JAPC
New RDA3client
Wojciech Sliwinski, Middleware Renovation 41
Risk: Bugs affect operations
25th April 2013
Mitigation: Gradual Migration (2)
Deploy first on systems controlled by the MW team E.g. Proxies, Gateways
Gain experience and confidence Start deployment with less critical systems first
Wojciech Sliwinski, Middleware Renovation 42
Risk: Bugs affect operations
25th April 2013
Mitigation: Fast deployment of bugfixes
If (inspite of all) something goes wrong in operations Fast reaction from the MW team
In CO, we will study the need and mechanisms to quickly upgrade also servers
Wojciech Sliwinski, Middleware Renovation 43
Conclusions
We have to replace CORBA with a new solution
We collected updated users requirements
MW upgrade will be performed during LS1
Interoperability between RDA2 RDA3
Gradual control system migration until LS2
End-of-Life for RDA2: LS2
25th April 2013