1
Advances in Data Distribution Systems, High-Level Product Generation, and the Measurement of Data Quality Metrics at the IRIS Data Management Center [T5-P15] Tim Ahern, IRIS Data Management Center, 1408 NE 45 th St., Seattle, WA 98105, USA Contributions and implementations from DMC Operations Group (Rick Benson, Anh Ngo), Software Engineering Group (Rob Casey, Sandy Stromme, Sue Schoch), Products and Services Group (Chad Trabant, Bruce Weertman, Rich Karstens, Yazan Suleiman, Manoch Bahavar, Alex Hutko) Science and Technology 2011 Conference • 8-10, June 2011 • Vienna, Austria Real Time Seismic Data Ingestion at the IRIS DMC. The IRIS DMC supports most of the real time communication protocols in common use. In some cases we use plug-ins to support specific protocols. At the present time we support Reftek RTPD, GSN IDA System Interface (ISI) , Guralp, BRTT ORB-to-ORB, CD1.0 and CD1.1, LISS, Nanometrics NAQS, Nanometrics Apollo Server, NEIC’s Ring Replicator Protocol (RRP), SeedLink, Earthworm, and Antelope data streams. All of these data result in miniSEED data records being stored in the Buffer of Uniform Data (BUD) directory structure on servers at the IRIS DMC. The BUD to Archive Transfer System (BATS) moves copies of the BUD data to the large (presently 200 terabyte) Isilon RAID Storage Cluster where data are available through all of the DMC’s access tools. Data remain in the BUD for periods of time from weeks to months depending on the specific network. Data are available perpetually in the Isilon Storage Cluster (the DMC Archive). Data in the BUD are, in general, made available to anyone through our real time SeedLink Server. This server allows very simple installation for a system that supports real time feeds. This is a turnkey system that IRIS will make available to anyone requesting it after we complete beta testing of the system. QUACK RRP Antelope ORB SeedLink Earthworm CD1.0 RTPD ISI Guralp ORB CD1.1 LISS CD1.1 BUD Archive BATS nmxptool Nano metrics Turnkey SeedLink Server Apollo Real Time Data Reception and the Data Archive. On a typical day the IRIS DMC receives data from about 2,100 seismic stations. About 72% of all data the DMC receives is via real time methods. The remainder of the data typically come from temporary deployments such as supported by IRIS PASSCAL, the British SEIS- UK initiative, the French FOSFORE effort or similar experiment-based initiatives. The total volume of data now managed at the DMC is approximately 138 terabytes of primary observational data. The map above shows the real time stations sending data to the DMC on May 21, 2011. The chart on the right shows the data in the DMC archive including data from the GSN, FDSN, US Regional Networks, Strong Motion Engineering data, Portable Deployments such as PASSCAL, and finally EarthScope. Quality Assurance at the IRIS DMC - QUACK. As data are received in BUD, a variety of metrics are calculated for most seismic time series. The QUACK framework views real time data entering BUD much as a conveyor belt carrying data. Configured with Delay and Duration parameters for each seismic channel, QUACK determines when quality control metrics get measured and for what time window. The system runs continuously and by calculating roughly one dozen QC metrics, time series data quality is continuously monitored at the DMC. The results are made available through the web site, through customized reports, as well as through a customizable Query Interface that allows complex searches of data with specified quality parameters. QC Metrics being measured at the present time include •Daily Signal RMS •Daily Signal Mean •Daily Percent Available •Number of Gaps per Day •Number of Overlaps per Day •Overall Timing Quality •Max Overlap •STA/LTA Plot •Percent above High Noise Model A metric to measure longest continuous time spans is likely to be added in the near future. Additionally the NEIC Probability Density Function Displays are always available. BUD Duration Delay QUACK QUality Assurance Control Kit Examples of QUACK QC Metrics. Examples of QUACK Metrics are shown above for the IC station BJT near Beijing. The colorful figures on the left show the results for an entire year (2010) of Probability Density Functions for the broadband vertical channel at BJT. PDFs have proven to be a a key diagnostic tool that allows station quality to be easily assessed. IRIS has PDFs beginning in 2004 and continuing to present time. Without continuous data obviously noise metrics indicative of station quality can not be measured. The figures on the right side show scatter plots of three individual quality metrics for station BJT for the year 2010. Again these plots allow quick and simple assessment of data quality for a given station. Reference: McNamara, D.E. and R.P. Buland, Ambient Noise Levels in the Continental United States, Bull. Seism. Soc. Am., 94, 4, 1517-1527, 2004 Gaps per Day Percent Data Availability Maximum Gap in msec Overview: The IRIS Data Management Center (DMC) manages the largest concentration of broadband seismological data in the world. DMC data and services are free and open. A rich variety of request tools are currently being extended using Representational State Transfer (REST) web services. These new methods of providing access to data will greatly simplify a research or monitoring group’s ability to retrieve information quickly, reliably, and in a form that is readily usable. In addition to raw time series data, the IRIS DMC is developing higher-order products intended to raise the level from which subsequent research begins. These products include such things as ground motion visualizations, a variety of event plot displays, synthetic seismograms, management of tomographic models along with visualization tools, GCMTs, calibration information, and receiver functions. These products are available through web services from the IRIS DMC’s new product management system, SPUD. A final project, being built using web services, is an entirely new implementation of the IRIS DMC’s quality assurance system, QUACK. The new system will use web services in a manner that allows attributes of data quality for any time series channel (seismic, infrasound, hydro acoustic, etc) to be calculated. The modular system will include several improvements over the existing system, providing better scalability, flexibility, and usability. This poster provides an overview of these new services (data access, products, and quality metrics) to the CTBTO community. A New IRIS Quality Initiative. The QUACK system has shown the importance of automated quality assurance procedures when dealing with real time data. However, since QUACK is now 10 years old, IRIS is undertaking a refactoring of this software system. The new system will be developed around a service oriented architecture (SOA) using REST web services. MUSTANG will be capable of monitoring the quality of data in near real time as well as any data in any IRIS DMC data repository. This will extend QA procedures to data not received in real time such as data from temporary experiments or data that is embargoed for a period of time. The modular approach that will be used in MUSTANG will allow: • the addition of new metric calculators to be done more easily, • will allow external client software to generate reports meeting their individual needs, • the groundwork for future data requests to be based upon data quality metrics, and • optimize performance using an internal caching mechanism in place at the DMC MUSTANG will initially include all the current QUACK metric calculators as well as newer metrics calculated in several frequency bands, cross-correlation techniques between multiple sensors at a given station, and comparisons of observed data with synthetics. We anticipate considerable improvements in reporting capabilities as well. MUSTANG: M odular U tility for STA tistical kN owledge G athering - Using Web Services for Quality Assurance - The BEST Quality Assurance: PEOPLE. The IRIS DMC takes pride in the number of worldwide researchers to which it sends data. Researchers use and evaluate the data in many different ways and this ultimately improves the overall data quality of data at the IRIS DMC. While the QUACK system looks at real time data systematically it does not look at the data in ways other than the algorithms that are implemented. By releasing data to the broader community, problems, characteristics, and related quality issues are more readily identified. The DMC operates a problem reporting system that facilitates communication with network operators allowing tracking and resolution of user reported problems. The top graph to the left shows that the DMC anticipates shipping more than 140 terabytes to the research community in 2011. 58% of the data resulted from the requests to the archive, about 28% from real time feeds though the DMC SeedLink server and the other 14% was sent through the Data Handling Interface or via our new web services. The graph on the bottom shows that the number of first time users of DMC services continues to grow and will likely exceed 1,000 new users in 2011. +,- ./"01(2') %& 3(45()' 6&0( Web Services at the IRIS DMC: The World Wide Web Consortium defines a web service as a software system designed to support interoperable machine to machine interactions over a network. Communication uses http protocol and this generally solves the “firewall problem since the services run over port 80. In practice these look like a URI of the form: http:/www.iris.edu/ws/timeseries/query?net=IU&sta=ANMO&loc=00&cha=BHZ&start=2011-03-11T05.56.00&end=2011-03-11T06.56.00 &scale=AUTO&antialiasplot=true&output=plot&ref=direct IRIS has implemented two types of web services; Data Access Services include access to waveforms, metadata, events, and products and Processing Services that include Digital Signal Processing Services implementing things such as filtering, instrument correction, and rotation of horizontal components for example. IRIS web services can be accessed at http://www.iris.edu/ws . Extensive documentation for each web service can be found at this URL. Simple clients written in java, Perl, or other scripting languages using wget can be developed to allow access to single time series, or to thousands of time series based upon one’s needs. IRIS has 4 clients written in Perl available at http://www.iris.edu/ws/wsclients/ . Product Developments at the IRIS DMC: Traditionally the IRIS DMC has focused on the distribution of raw (level 0)and Quality Assured (level 1) waveforms. Motivated by EarthScope, the DMC now produces higher level products shown below. These include Level 2 products produced from Level 0/1 data using accepted methods, level 3 products requiring technical analysis and interpretation to produce, and Level 4 products that integrate cross-disciplinary data. The Product Level is identified by the number within a circle in the following examples. Tohoku Earthquake recorded at Albuquerque Gain Corrected Gain Corrected and Low Pass Filtered at 0.02 hertz Web Services are consistently documented in a manner that clearly shows query parameter usage and calling conventions. Individual web services each have a GUI builder that shows how to form a correct URL for the service. GUI Builders can be found near the bottom of each web service documentation page. Using Web Services IRIS believes that we can significantly lower the barrier for other communities to make use of IRIS data. Using web services we provide data in a form that can be readily used. The figures to the right show a plot of barometric pressure data from Transportable Array station 142A. The pressure is shown in pascals. Actual values can also be returned in the form of an ASCII file showing time value pairs or a list of values only. A one line header provides the metadata necessary to understand the data more fully. Earth Model Collaboration: Tomographic Models from the research community are managed in a model repository. After conversion to NetCDF format, the models can be visualized through a variety of tools. Source Time Functions: For earthquakes larger than M6.0 the IRIS DMC will generate either body wave or surface wave Source Time Functions. Back Projections: Standardized, fully automated, simple, back projection results can be used as a quick reference for further studies such as Finite-Fault Modeling. IRIS will generate these for events larger than M7.0. Ground Motion Visualizations Receiver Functions Event Plot Suite Events & Phases Global CMTs Solutions from the Global CMT project now based at Lamont Doherty Earth Observatory of Columbia University. Film Chip Scans Scanned images of WWSSN Film Chips in tiff format have been produced for significant earthquakes. These images are available through the SPUD product management system Magnetotelluric Transfer Functions New products that will be available soon: The IRIS DMC is continually developing new products. Those that will become available over the next few months include the 1) Management of synthetic seismograms (level 3), 2) Earth Model Collaboration (tomographic models and tools), 3) Source Time Function estimations, and 4) Back Projections. The S earchable ProdU ct D epository (SPUD). Most IRIS DMC products are managed in SPUD. Built using web services, all products are available using SPUD query and search capabilities. For instance SPUD will allow the discovery of all products within a geographic bounding box, a time range, and meeting other event parameters such as magnitude, depth range, etc. SPUD can be accessed at: http://www.iris.edu/spud The use of SPUD should be quite intuitive for any user. 2 2 3 2 2 0 3 3 2 2

Advances in Data Distribution Systems, High-Level Product …€¦ · Advances in Data Distribution Systems, High-Level Product Generation, and the Measurement of Data Quality Metrics

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Advances in Data Distribution Systems, High-Level Product Generation, and the Measurement of Data Quality Metrics at the IRIS Data Management Center [T5-P15]

Tim Ahern, IRIS Data Management Center, 1408 NE 45th St., Seattle, WA 98105, USA Contributions and implementations from DMC Operations Group (Rick Benson, Anh Ngo), Software Engineering Group (Rob Casey, Sandy Stromme, Sue Schoch), Products and Services Group (Chad Trabant, Bruce Weertman, Rich Karstens, Yazan Suleiman, Manoch Bahavar, Alex Hutko)

Science and Technology 2011 Conference • 8-10, June 2011 • Vienna, Austria

Real Time Seismic Data Ingestion at the IRIS DMC. The IRIS DMC supports most of the real time communication protocols in common use. In some cases we use plug-ins to support specific protocols. At the present time we support Reftek RTPD, GSN IDA System Interface (ISI) , Guralp, BRTT ORB-to-ORB, CD1.0 and CD1.1, LISS, Nanometrics NAQS, Nanometrics Apollo Server, NEIC’s Ring Replicator Protocol (RRP), SeedLink, Earthworm, and Antelope data streams. All of these data result in miniSEED data records being stored in the Buffer of Uniform Data (BUD) directory structure on servers at the IRIS DMC.

The BUD to Archive Transfer System (BATS) moves copies of the BUD data to the large (presently 200 terabyte) Isilon RAID Storage Cluster where data are available through all of the DMC’s access tools. Data remain in the BUD for periods of time from weeks to months depending on the specific network. Data are available perpetually in the Isilon Storage Cluster (the DMC Archive).

Data in the BUD are, in general, made available to anyone through our real time SeedLink Server. This server allows very simple installation for a system that supports real time feeds. This is a turnkey system that IRIS will make available to anyone requesting it after we complete beta testing of the system.

QUACK

RRP

Antelope ORB

SeedLink

Earthworm CD1.0

RTPD

ISI

Guralp

ORB

CD1.1

LISS

CD1.1

BUD Archive BATS

nmxptool Nano

metrics

Turnkey SeedLink

Server

Apollo

Real Time Data Reception and the Data Archive. On a typical day the IRIS DMC receives data from about 2,100 seismic stations. About 72% of all data the DMC receives is via real time methods. The remainder of the data typically come from temporary deployments such as supported by IRIS PASSCAL, the British SEIS-UK initiative, the French FOSFORE effort or similar experiment-based initiatives. The total volume of data now managed at the DMC is approximately 138 terabytes of primary observational data. The map above shows the real time stations sending data to the DMC on May 21, 2011. The chart on the right shows the data in the DMC archive including data from the GSN, FDSN, US Regional Networks, Strong Motion Engineering data, Portable Deployments such as PASSCAL, and finally EarthScope.

Quality Assurance at the IRIS DMC - QUACK. As data are received in BUD, a variety of metrics are calculated for most seismic time series. The QUACK framework views real time data entering BUD much as a conveyor belt carrying data. Configured with Delay and Duration parameters for each seismic channel, QUACK determines when quality control metrics get measured and for what time window. The system runs continuously and by calculating roughly one dozen QC metrics, time series data quality is continuously monitored at the DMC. The results are made available through the web site, through customized reports, as well as through a customizable Query Interface that allows complex searches of data with specified quality parameters.

QC Metrics being measured at the present time include • Daily Signal RMS •Daily Signal Mean • Daily Percent Available •Number of Gaps per Day • Number of Overlaps per Day •Overall Timing Quality • Max Overlap •STA/LTA Plot • Percent above High Noise Model

A metric to measure longest continuous time spans is likely to be added in the near future. Additionally the NEIC Probability Density Function Displays are always available.

BUD Duration

Delay

QUACK

QUality Assurance Control Kit

Examples of QUACK QC Metrics. Examples of QUACK Metrics are shown above for the IC station BJT near Beijing. The colorful figures on the left show the results for an entire year (2010) of Probability Density Functions for the broadband vertical channel at BJT. PDFs have proven to be a a key diagnostic tool that allows station quality to be easily assessed. IRIS has PDFs beginning in 2004 and continuing to present time. Without continuous data obviously noise metrics indicative of station quality can not be measured. The figures on the right side show scatter plots of three individual quality metrics for station BJT for the year 2010. Again these plots allow quick and simple assessment of data quality for a given station.

Reference: McNamara, D.E. and R.P. Buland, Ambient Noise Levels in the Continental United States, Bull. Seism. Soc. Am., 94, 4, 1517-1527, 2004

Gaps per Day

Percent Data Availability

Maximum Gap in msec

Overview: The IRIS Data Management Center (DMC) manages the largest concentration of broadband seismological data in the world. DMC data and services are free and open. A rich variety of request tools are currently being extended using Representational State Transfer (REST) web services. These new methods of providing access to data will greatly simplify a research or monitoring group’s ability to retrieve information quickly, reliably, and in a form that is readily usable. In addition to raw time series data, the IRIS DMC is developing higher-order products intended to raise the level from which subsequent research begins. These products include such things as ground motion visualizations, a variety of event plot displays, synthetic seismograms, management of tomographic models along with visualization tools, GCMTs, calibration information, and receiver functions. These products are available through web services from the IRIS DMC’s new product management system, SPUD. A final project, being built using web services, is an entirely new implementation of the IRIS DMC’s quality assurance system, QUACK. The new system will use web services in a manner that allows attributes of data quality for any time series channel (seismic, infrasound, hydro acoustic, etc) to be calculated. The modular system will include several improvements over the existing system, providing better scalability, flexibility, and usability. This poster provides an overview of these new services (data access, products, and quality metrics) to the CTBTO community.

A New IRIS Quality Initiative. The QUACK system has shown the importance of automated quality assurance procedures when dealing with real time data. However, since QUACK is now 10 years old, IRIS is undertaking a refactoring of this software system. The new system will be developed around a service oriented architecture (SOA) using REST web services. MUSTANG will be capable of monitoring the quality of data in near real time as well as any data in any IRIS DMC data repository. This will extend QA procedures to data not received in real time such as data from temporary experiments or data that is embargoed for a period of time. The modular approach that will be used in MUSTANG will allow:

•  the addition of new metric calculators to be done more easily, •  will allow external client software to generate reports meeting their individual needs, •  the groundwork for future data requests to be based upon data quality metrics, and •  optimize performance using an internal caching mechanism in place at the DMC

MUSTANG will initially include all the current QUACK metric calculators as well as newer metrics calculated in several frequency bands, cross-correlation techniques between multiple sensors at a given station, and comparisons of observed data with synthetics. We anticipate considerable improvements in reporting capabilities as well.

MUSTANG: Modular Utility for STAtistical kNowledge Gathering

- Using Web Services for Quality Assurance -

The BEST Quality Assurance: PEOPLE. The IRIS DMC takes pride in the number of worldwide researchers to which it sends data. Researchers use and evaluate the data in many different ways and this ultimately improves the overall data quality of data at the IRIS DMC. While the QUACK system looks at real time data systematically it does not look at the data in ways other than the algorithms that are implemented. By releasing data to the broader community, problems, characteristics, and related quality issues are more readily identified. The DMC operates a problem reporting system that facilitates communication with network operators allowing tracking and resolution of user reported problems. The top graph to the left shows that the DMC anticipates shipping more than 140 terabytes to the research community in 2011. 58% of the data resulted from the requests to the archive, about 28% from real time feeds though the DMC SeedLink server and the other 14% was sent through the Data Handling Interface or via our new web services. The graph on the bottom shows that the number of first time users of DMC services continues to grow and will likely exceed 1,000 new users in 2011.

!"

#!$!!!"

%!$!!!"

&!$!!!"

'!$!!!"

(!!$!!!"

(#!$!!!"

(%!$!!!"

(&!$!!!"

#!!(" #!!#" #!!)" #!!%" #!!*" #!!&" #!!+" #!!'" #!!," #!(!" #!(("-./01234256"

!"#$%&'()*

+,-*./"01(2')*%&*3(45()'*6&0(****

789:;/3<=>2"

789:9?9@7A"

789:BCD7"

789:EF7"

G@:HI2J2/=2J"

G@:5K4KJ2L234"

G9MEN?"

G=OOL2J"

@225M=PQ"

M9@@"

;R407?S"

;/3<=>2"

7)*89*70:";*<=*>?<<*

Web Services at the IRIS DMC: The World Wide Web Consortium defines a web service as a software system designed to support interoperable machine to machine interactions over a network. Communication uses http protocol and this generally solves the “firewall problem since the services run over port 80. In practice these look like a URI of the form: http:/www.iris.edu/ws/timeseries/query?net=IU&sta=ANMO&loc=00&cha=BHZ&start=2011-03-11T05.56.00&end=2011-03-11T06.56.00

&scale=AUTO&antialiasplot=true&output=plot&ref=direct IRIS has implemented two types of web services; Data Access Services include access to waveforms, metadata, events, and products and Processing Services that include Digital Signal Processing Services implementing things such as filtering, instrument correction, and rotation of horizontal components for example.

IRIS web services can be accessed at http://www.iris.edu/ws . Extensive documentation for each web service can be found at this URL. Simple clients written in java, Perl, or other scripting languages using wget can be developed to allow access to single time series, or to thousands of time series based upon one’s needs. IRIS has 4 clients written in Perl available at http://www.iris.edu/ws/wsclients/ .

Product Developments at the IRIS DMC: Traditionally the IRIS DMC has focused on the distribution of raw (level 0)and Quality Assured (level 1) waveforms. Motivated by EarthScope, the DMC now produces higher level products shown below. These include Level 2 products produced from Level 0/1 data using accepted methods, level 3 products requiring technical analysis and interpretation to produce, and Level 4 products that integrate cross-disciplinary data. The Product Level is identified by the number within a circle in the following examples.

Tohoku Earthquake recorded at Albuquerque

Gain Corrected

Gain Corrected and Low Pass Filtered at 0.02 hertz

Web Services are consistently documented in a manner that clearly shows query parameter usage and calling conventions. Individual web services each have a GUI builder that shows how to form a correct URL for the service. GUI Builders can be found near the bottom of each web service documentation page.

Using Web Services IRIS believes that we can significantly lower the barrier for other communities to make use of IRIS data. Using web services we provide data in a form that can be readily used. The figures to the right show a plot of barometric pressure data from Transportable Array station 142A. The pressure is shown in pascals. Actual values can also be returned in the form of an ASCII file showing time value pairs or a list of values only. A one line header provides the metadata necessary to understand the data more fully.

Earth Model Collaboration: Tomographic Models from the research community are managed in a model repository. After conversion to NetCDF format, the models can be visualized through a variety of tools.

Source Time Functions: For earthquakes larger than M6.0 the IRIS DMC will generate either body wave or surface wave Source Time Functions.

Back Projections: Standardized, fully automated, simple, back projection results can be used as a quick reference for further studies such as Finite-Fault Modeling. IRIS will generate these for events larger than M7.0.

Ground Motion Visualizations

Receiver Functions

Event Plot Suite

Events & Phases

Global CMTs

Solutions from the Global CMT project now based at Lamont Doherty Earth Observatory of Columbia University.

Film Chip Scans

Scanned images of WWSSN Film Chips in tiff format have been produced for significant earthquakes. These images are available through the SPUD product management system

Magnetotelluric Transfer Functions

New products that will be available soon: The IRIS DMC is continually developing new products. Those that will become available over the next few months include the 1) Management of synthetic seismograms (level 3), 2) Earth Model Collaboration (tomographic models and tools), 3) Source Time Function estimations, and 4) Back Projections.

The Searchable ProdUct Depository (SPUD). Most IRIS DMC products are managed in SPUD. Built using web services, all products are available using SPUD query and search capabilities. For instance SPUD will allow the discovery of all products within a geographic bounding box, a time range, and meeting other event parameters such as magnitude, depth range, etc. SPUD can be accessed at:

http://www.iris.edu/spud The use of SPUD should be quite intuitive for any user.

2

2

3

2

2

0

3

3

2

2