Upload
edwin-kelly
View
220
Download
3
Tags:
Embed Size (px)
Citation preview
1
DAAC Technology Snapshot
Tom KalvelageLand Processes DAAC ManagerFor The DAACsESISS, February 17, 2004
2
Objective of the presentation:
• Present lessons learned from the collective experience of the DAACs that may be relevant with respect to the Data and Information Management Plan.
• Present experience with technology and its impacts on the DAACs.
3
ESE Data Centers
SEDACHuman Interactions in
Global Change
GES DAAC-GSFCUpper Atmosphere
Atmospheric Dynamics, Ocean Color, Hydrology
Global Biosphere, Radiance Data
ASDC-LaRCRadiation Budget,CloudsAerosols, Tropospheric
Chemistry
ORNLBiogeochemical
DynamicsEOS Land Validation
ASFSAR Products
Sea IcePolar Processes
NSIDCCryosphere
Polar Processes
LP DAAC-EDCLand Processes
& Features
PODAAC-JPLOcean Circulation
Air-Sea Interactions
GHRCHydrologic Cycle& Severe Weather
4
Background
The DAACs are located at host institutions that care about the user communities their DAAC serves.Each DAAC is unique, but together we are fully operational on a large scale serving all users.The DAACs use a variety of systems to help users.• Cross-DAAC systems like the EOSDIS Core System (ECS) and
EOS Data Gateway.• Local DAAC systems like LaTIS, TRMM Support System, “Version
0” systems, ASTER Browse Tool, Mercury, etc.• Many ‘add-ons’ to existing systems (ECS and non-ECS).• Tools and scripts both online and delivered to users.
In addition to systems; we also work with the science and applications communities to ensure their needs are met: • Users via User Working Groups, Science Investigator-led
Processing Systems (SIPS), REASoN CAN members, field campaigns, universities, applications community, education, national and international research projects and so on.
5
Lessons Learned
While the DAACs are very diverse and each has its own particular work, experience, and viewpoint, collectively there are some lessons learned across the DAACs that we believe may be relevant.
6
Developer Interaction with Operations
Having DAAC Operations work closely with developers has contributed to data system success.• User Services made sure new capability would satisfy users.• Operators provided realistic ops concepts and requirements.
• Systems were more likely to be useable and/or efficiently operable as delivered.
• Sustaining engineers provided installation and maintenance advice and requirements.• Integration and test has taken less time and requires less
rework (i.e. “measure twice, cut once”) to get from integration, through test, to operations.
• Ensured that the architecture and interfaces are flexible enough to be maintained and upgraded in the future.
On-site developers from off-site development contractor can help, but care must be taken to avoid impacting operations or development.
7
DAAC Engineering/Development Capability
A DAAC on-site engineering and development capability has significantly contributed to mission success. It has:• enabled the DAACs to evaluate and/or implement new
information technology, helping keep data systems current and improving services for users.
• provided needed capability via scripts, tools, and subsystems when delivered systems (COTS or GFE) fall short.• This includes tools for users (e.g., subsetting, reformatting)
and for operations (e.g., data management).• helped successfully integrate multiple delivered and
developed systems (e.g., EOSDIS Core System, EOS Data Gateway, Product Distribution System, etc).
• provided improved capability to deploy advanced information technology into an operational or near-operational environment for more rapid feedback on operations readiness.
8
User Services
User Services (including order service and technical support) has significantly contributed to mission success.• User Services has worked best when it is an integral part of
the DAAC, able to:• influence the engineering and operations of the DAAC on
behalf of the users, and • educate the users about the DAAC’s engineering and
operations.• User Services has been essential to explain to users the user
search and order interfaces, data formats, data content, algorithms, applicability of the data, and potential ways of using the data.
• Informal outreach performed by User Services has been instrumental in making the wider user community aware that the data are even available.
9
Big Systems vs. Little Systems
Capability can be implemented in a wide spectrum of ways, from doing everything in one large system (one-size-fits-all) to doing each small thing in its own small system (one-size-fits-one).• Experience shows that doing 100% in one system doesn’t
work due to significant inertia and cost. • Anything less than 100% (one-size-fits-most) will work, but
collective experience does not tell us the best solution or equilibrium point – its different for each situation.
Examples:• After evaluation, half of the DAACs did not use the big ECS
system; that turned out to be a good choice.• A user interface providing some functionality to all data sets
allowed unique work to be done by unique user interfaces.• Moving unique processing out of ECS to smaller systems,
reduced the effort required to address unique requirements.• Supporting multiple missions with a ‘one-size-fits-most’ big
system (e.g., ECS) allowed us to have more mature systems and processes and less risk when new missions started up.
10
Systems In General
Having worked with a number of systems, in general we have some lessons learned about them.• Systems designed with the maximum amount of COTS
software (e.g., ECS) do have a lot of functionality. However, complex COTS and hardware interdependencies combined with planned obsolescence cause significant integration work.
• Systems that are not designed to be automated are very difficult to automate later.
• Systems are more operable, and less costly to operate, if operated at or below specified performance levels (margin).
• Regarding standards and formats, in general: • It’s best to use existing standards and formats.
– Data in existing standards and formats were used, and did not require tool development.
– Data in new standards and formats were not used, until tools were developed.
• Translators are generally preferable to new tools.
11
Data Distribution on Hard Media
Users continue to demand data distributed on hard media.
• While hard media distribution is expected to decline, at least partially due to improved networks, several types of users have resisted this trend:• Users with poor or expensive network connections (such as
international users).• Users with limited local storage capacity.• Users who just want the data on media.
12
Communications
Maintaining good communication and working closely together amongst the DAACs has been productive.• The DAACs began meeting together in the early 1990s.• The DAACs were advised to form an alliance by a 1998
National Research Council Review, and did so.• The DAACs have balanced working independently and
together:• The DAACs have very different user communities,
measurements, products, environments, and cultures.• However, user service, data stewardship, and information
technology are common to all DAACs.• Agreement is by consensus; difficult sometimes, but rewarding
and worthwhile.
13
Information Technology Observations
Hardware• The price/performance trend for utilizing commodity CPUs shows
favorable results. A real transitional jump will be when idle desktop PCs and disk can be used for science processing.
• With disk and tape price/performance improving, full archive on-line disk storage with necessary backup will become more feasible.
• WAN performance increases will further enable distributed EOSDIS data system architecture.
Software• Database usage continues to be limited by throughput performance • Grid computing is certainly worth pursuing to determine its real
applicability.• Affordable mass storage will facilitate a proliferation of ‘personally’
customized datasets. • Automated agents, already being utilized, will find additional areas for
implementation (e.g., ordering data and performing data services). Translators versus standards• Standard formats are necessary but not always enforceable.• Translators between standards need to be implemented.
14
Information Technology Observations
Database Management• Methods of managing large quantities of both metadata and
data need to be examined with long term archiving in mind.Data Migration • Data need to be migrated regularly to be preserved.• Plans for migration need to be an integral part of all data
management schemes, not an afterthought.Longer term observations• We will need to better understand the relationship between
data ownership, intellectual property rights, and really long-term preservation.
• The latter is likely to require much more geographically dispersed replication of data than we have now - and will require automated migration from technology to technology in order to deal with technological obsolescence
15
Some Things That Keep Us Up At Night
Examples:• Technology Fads - reaching the right and affordable balance
between innovation and experience.
• Additional infrastructure requirements (e.g., security, statistics) that strain fixed resources.
• Maintaining a highly competent IT staff.
• Timely receipt of data and documentation from field campaigns.
• Achieving metadata management and ‘harmony’ across datasets and user communities, including conversion between the many metadata standards used across the sciences.
• Long term data stewardship and orphaned data sets.
16
Examples:• The reuse/adaptability of new technologies including new web-
based visualization tools that enhance the use and usefulness of data.
• Completing an implementation that the user community gets excited about (service, data access tools) because they find it useful.
• Developing and implementing concepts and technologies including open standards that enable users to extract information from data, and knowledge out of information.
• Collaborations to develop new products and services which employ maturing data management and advancing information technology.
Some Things That Get Us Excited
17
Backup Slides
18
Staff Functions
Definitions• Operations: (data system operators, some on shift) • Sustaining Engineering: (installs, tests software and in
some cases hardware, maintains and tunes data system; e.g., data base administration, systems administration of hardware and systems software resources)
• Development Engineering: (designs, plans, and develops replacement systems and subsystems)
• User Services: (interface to users, support in filling orders, answering user questions, DAAC outreach to users)
• Mission (Science Data) Support: (interface to science data providers, DAAC-internal data set coordination, maintenance of data set documentation)
• Management: overall coordination and resource planning
19
Functional Mix: FY2000 and FY2004
FY2004
OPERATIONS24%
SUS ENGR24%
DEV ENGR8%
USER SERVICES13%
MISSION (SCIENCE DATA) SUPPORT
23%
MANAGEMENT8%
FY2000
OPERATIONS30%
SUS ENGR19%
DEV ENGR7%
USER SERVICES12%
MISSION (SCIENCE DATA) SUPPORT
25%
MANAGEMENT7%
20
Functional Mix: Overall 2000 vs 2004
YEAR OPERATIONSSUS
ENGRDEV
ENGRUSER
SERVICES
MISSION (SCIENCE
DATA) SUPPORT MANAGEMENT
2000AVERAGE 29% 19% 7% 12% 25% 7%
MAX 39% 29% 17% 23% 55% 13%MIN 12% 5% 1% 4% 6% 3%
2004AVERAGE 24% 24% 8% 13% 23% 8%
MAX 40% 33% 15% 25% 41% 13%MIN 11% 8% 3% 4% 8% 4%
Each DAAC provided percentages for their FTE by functional group.
21
Networks
Observations• DAACs have at least 100mb connectivity to their users with most in the gigabit range
• ASF currently has 10 mb connectivity • Internal DAAC networks are 100mb or faster, with most at gigabit speeds
DAAC FY2000 FY2004 FY2000 FY2004ASF.DAAC 10-1000 10
GES.DAACHIPPI, Fast Ethernet
Fast Ethernet, Gigabit, SAN Fast Ethernet Abilene,CNE
GHRC Fast Ethernet 100/1000 GigELARC.DAAC FDDI Gigabit FDDI Gigabit
LP.DAACFDDI, Fast Ethernet Gigabit DS-3 DS-3
NSIDC.DAAC Fast EthernetGigabit & Fast Ethernet Fast Ethernet OC12, Abilene
ORNL.DAAC Fast EthernetGigabit & Fast Ethernet dual OC12 dual OC45
PO.DAAC 100Mb/s 1000Mb/s OC-12 (622 Mb/s) 1000 Mb/s
SEDACEthernet (10 mbps)
Gigabit & Fast Ethernet (100 mbps)
T3 (45 mbps) & OC5 (155 mbps)
Microwave Link (100 mbps), T3 (45 mbps) & OC5 (155 mbps)
INTERNAL NETWORK LINK TO EXTERNAL USERS
22
COTS in FY2004
Examples of COTS packages used at DAACs. This list is not comprehensive.
Archive SW AMASS, LTO Driver ADIC Tape Robot Data analysis ENVI,gs,IDL,IDL/ION,Imagine, SAS, SPSS DBMS Gdb, gdbm, Informix,Ingres, Java Database Driver Ingres,
MetaStar,MySQL,Oracle,PL/SQL Developer, SMMS, Sybase Development tools
Autoconf,automake,binutils,bison,clearcase,clearDDTS,cvs,depot,diffutils,ERWINCASE Tool, filter, gcc, ginstall,hdf,ImageMagick,Java,libpng,logdaemon, Isof, make, patch, Perl, rdist, Sun Workshop, tcl, tcsh, tiff, tk, top
Distribution Support
CD/DVD Studio, fastcopy, gzip, rimage, tar, unzip, wu-ftpd, zip
Document Formatting
Acrobat, illustrator, dreamweaver, emacs, enscript, flash, flex, homsite, ispell,jpeg,xemacs
GIS Software ArcIMS, ArcIMS/ArcSDE/ArcGIS, ArcServe, ArcView, RedSpider Help Desk Remedy, Rightnow, RightNow eService Center IT Security Checkpoint VPN-1, F-Secure, InterScan, SecrueCRT, Tripwire,
ZoneAlarm Operating Systems
IRIX, Linux, OS-XX, Solaris, Win98, Win2K, Win2003, Unix
Storage Management
Networker, Unitree, Veritas Netbackup, Veritas Volume Manager
Web Server Apache, Tomcat, Web Trends Reporting Center
23
Acronyms
ASDC Atmospheric Sciences Data CenterASF Alaska SAR FacilityASTER Advanced Spaceborne Thermal Emission
and Reflection RadiometerCAN Cooperative Agreement NoticeCOTS Commercial Off the ShelfCPU Computer Processing UnitDAAC Distributed Active Archive CenterDBMS Data Base Management SystemECS EOSDIS Core SystemEDC EROS Data CenterEOS Earth Observing SystemEOSDIS EOS Data and Information SystemESE Earth Sciences EnterpriseFDDI Fiber Distributed Data InterconnectFTE Full Time EquivalentFY Fiscal YearGES DAAC GSFC Earth Sciences DAACGFE Government Furnished Equipment
24
Acronyms, continued
GHRC Global Hydrology Resource CenterGIS Geographic Infromation SystemGSFC Goddard Space Flight CenterHIPPI High-Performance Parallel InterfaceIT Information TechnologyLaRC Langley Research CenterLaTIS Langley TRMM Information SystemLP DAAC Land Processes DAACNSIDC National Snow and Ice Data Center DAACORNL Oak Ridge National Laboratory DAACPC Personal ComputerPO DAAC Physical Oceanography DAACREASoN Research, Education, and Applications Solutions
NetworkSEDAC Socioeconomic Data and Applications CenterSIPS Science Investigator-led Processing SystemSW SoftwareTRMM Tropical Rain Measurement MissionWAN Wide Area Network