58
pandaSDMX Documentation Release 0.6.1 Dr. Leo Mar 16, 2017

pandaSDMX Documentation - media.readthedocs.org · CHAPTER 1 Main features •support for many SDMX features including – generic data sets in SDMXML format – compact data sets

  • Upload
    lamtruc

  • View
    250

  • Download
    0

Embed Size (px)

Citation preview

pandaSDMX DocumentationRelease 0.6.1

Dr. Leo

Mar 16, 2017

Contents

1 Main features 3

2 Example 5

3 Quick install 7

4 pandaSDMX Links 9

5 Table of contents 115.1 What’s new? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115.2 FAQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145.3 Getting started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155.4 A very short introduction to SDMX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165.5 Basic usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205.6 Dealign with agencies (data providers) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.7 Advanced topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.8 pandasdmx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.9 Contributing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455.10 License . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

6 Indices and tables 47

Python Module Index 49

i

ii

pandaSDMX Documentation, Release 0.6.1

pandaSDMX is an Apache 2.0-licensed Python package aimed at becoming the most intuitive and versatile tool toretrieve and acquire statistical data and metadata disseminated in SDMX format. It supports out of the box the SDMXservices of the European statistics office (Eurostat), the European Central Bank (ECB), the French National Institutefor statistics (INSEE), the Australian Bureau of Statistics, and the OECD (JSON only). pandaSDMX can export dataand metadata as pandas DataFrames, the gold-standard of data analysis in Python. From pandas you can export dataand metadata to Excel, R and friends. As from version 0.4, pandaSDMX can export data to many other file formatsand database backends via Odo.

Contents 1

pandaSDMX Documentation, Release 0.6.1

2 Contents

CHAPTER 1

Main features

• support for many SDMX features including

– generic data sets in SDMXML format

– compact data sets in SDMXJSON format (OECD only)

– data structure definitions, code lists and concept schemes

– dataflow definitions and content-constraints

– categorisations and category schemes

• pythonic representation of the SDMX information model

• When requesting datasets, validate column selections against code lists and content-constraints if available

• export data and structural metadata such as code lists as multi-indexed pandas DataFrames or Series, and manyother formats and database backends via Odo

• read and write SDMX messages to and from local files

• configurable HTTP connections

• support for requests-cache allowing to cache SDMX messages in memory, MongoDB, Redis or SQLite

• extensible through custom readers and writers for alternative input and output formats of data and metadata

• growing test suite

3

pandaSDMX Documentation, Release 0.6.1

4 Chapter 1. Main features

CHAPTER 2

Example

Suppose we want to analyze annual unemployment data for some European countries. All we need to know in advanceis the data provider, eurostat. pandaSDMX makes it super easy to search the directory of dataflows, and the completestructural metadata about the datasets available through the selected dataflow. We will skip this step here. The impa-tient reader may directly jump to Basic usage. The dataflow with the ID ‘une_rt_a’ contains the data we want. Thedataflow definition references a datastructure definition with the ID ‘DSD_une_rt_a’. It contains or references all themetadata describing data sets available through this dataflow: the dimensions, concept schemes, and correspondingcode lists.

In [1]: from pandasdmx import Request

In [2]: estat = Request('ESTAT')

# Download the metadata and expose it as a dict mapping resource names to pandas→˓DataFramesIn [3]: metadata = estat.datastructure('DSD_une_rt_a').write()

# Show some code listsIn [4]: metadata.codelist.ix[['AGE', 'UNIT']]Out[4]:

dim_or_attr nameAGE AGE D AGE

TOTAL D TotalY25-74 D From 25 to 74 yearsY_LT25 D Less than 25 years

UNIT UNIT D UNITPC_ACT D Percentage of active populationPC_POP D Percentage of total populationTHS_PER D Thousand persons

Next we download a data set. We use codes from the code list ‘GEO’ to obtain data on Greece, Ireland and Spain only.

In [5]: resp = estat.data('une_rt_a', key={'GEO': 'EL+ES+IE'}, params={'startPeriod':→˓'2007'})

5

pandaSDMX Documentation, Release 0.6.1

# We use a generator expression to narrow down the column selection# and write these columns to a pandas DataFrameIn [6]: data = resp.write(s for s in resp.data.series if s.key.AGE == 'TOTAL')

# Explore the data set. First, show dimension namesIn [7]: data.columns.namesOut[7]: FrozenList(['UNIT', 'AGE', 'SEX', 'GEO', 'FREQ'])

# and corresponding dimension valuesIn [8]: data.columns.levels\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\Out[8]: FrozenList([['PC_ACT→˓', 'PC_POP', 'THS_PER'], ['TOTAL'], ['F', 'M', 'T'], ['EL', 'ES', 'IE'], ['A']])

# Show aggregate unemployment rates across ages and sexes as# percentage of active populationIn [9]: data.loc[:, ('PC_ACT', 'TOTAL', 'T')]\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\Out[9]:→˓

GEO EL ES IEFREQ A A ATIME_PERIOD2016 23.5 19.6 7.92015 24.9 22.1 9.42014 26.5 24.5 11.32013 27.5 26.1 13.12012 24.5 24.8 14.72011 17.9 21.4 14.72010 12.7 19.9 13.92009 9.6 17.9 12.02008 7.8 11.3 6.42007 8.4 8.2 4.7

6 Chapter 2. Example

CHAPTER 3

Quick install

• conda install -c alcibiade pandasdmx # for Anaconda users

• pip install pandasdmx # for all others

7

pandaSDMX Documentation, Release 0.6.1

8 Chapter 3. Quick install

CHAPTER 4

pandaSDMX Links

• Download the latest stable version from the Python package index

• Mailing list

• github

• Official SDMX website

9

pandaSDMX Documentation, Release 0.6.1

10 Chapter 4. pandaSDMX Links

CHAPTER 5

Table of contents

What’s new?

v0.6.1 (2017-02-03)

• fix 2to3 issue which caused crashes on Python 2.7

v0.7 (2017-03.xx)

• add Australian Bureau of Statistics as new data provider

• load metadata on data providers from json file; allow the user to add new agencies on the fly by specifying anappropriate JSON file using the pandasdmx.api.Request.load_agency_profile().

v0.6 (2017-01-07)

This release contains some important stability improvements.

Bug fixes

• JSON data from OECD is now properly downloaded

• The data writer tries to gleen a frequency value for a time series from its attributes. This is helpful whenexporting data sets, e.g., from INSEE (‘Issue 41 <https://github.com/dr-leo/pandaSDMX/issues/41>‘_).

Known issues

A data set which lacks a FREQ dimension or attribute can be exported as pandas DataFrame only whenparse_time=False?, i.e. no DateTime index is generated. The resulting DataFrame has a string index. Use pandasmagic to create a DateTimeIndex from there.

11

pandaSDMX Documentation, Release 0.6.1

v0.5 (2016-10-30)

New features

• new reader module for SDMX JSON data messages

• add OECD as data provider (data messages only)

• pandasdmx.model.Category is now an iterator over categorised objects. This greatly simplifies cate-gory usage. Besides, categories with the same ID while belonging to multiple category schemes are no longerconflated.

API changes

• Request constructor: make agency ID case-insensitive

• As Category is now an iterator over categorised objects, Categorisations is no longer considered partof the public API.

Bug fixes

• sdmxml reader: fix AttributeError in write_source method, thanks to Topas

• correctly distinguish between categories with same ID while belonging to different category schemes

v0.4 (2016-04-11)

New features

• add new provider INSEE, the French statistics office (thanks to Stéphan Rault)

• register ‘.sdmx’ files with Odo if available

• logging of http requests and file operations.

• new structure2pd writer to export codelists, dataflow-definitions and other structural metadata from structuremessages as multi-indexed pandas DataFrames. Desired attributes can be specified and are represented bycolumns.

API changes

• pandasdmx.api.Request constructor accepts a log_level keyword argument which can be set to alog-level for the pandasdmx logger and its children (currently only pandasdmx.api)

• pandasdmx.api.Request now has a timeout property to set the timeout for http requests

• extend api.Request._agencies configuration to specify agency- and resource-specific settings such as headers.Future versions may exploit this to provide reader selection information.

• api.Request.get: specify http_headers per request. Defaults are set according to agency configuration

• Response instances expose Message attributes to make application code more succinct

• rename pandasdmx.api.Message attributes to singular form Old names are deprecated and will be re-moved in the future.

12 Chapter 5. Table of contents

pandaSDMX Documentation, Release 0.6.1

• pandasdmx.api.Request exposes resource names such as data, datastructure, dataflow etc. as descriptorscalling ‘get’ without specifying the resource type as string. In interactive environments, this saves typing andenables code completion.

• data2pd writer: return attributes as namedtuples rather than dict

• use patched version of namedtuple that accepts non-identifier strings as field names and makes all fields acces-sible through dict syntax.

• remove GenericDataSet and GenericDataMessage. Use DataSet and DataMessage instead

• sdmxml reader: return strings or unicode strings instead of LXML smart strings

• sdmxml reader: remove most of the specialized read methods. Adapt model to use generalized methods. Thismakes code more maintainable.

• pandasdmx.model.Representation for DSD attributes and dimensions now supports text not justcodelists.

Other changes and enhancements

• documentation has been overhauled. Code examples are now much simpler thanks to the new structure2pdwriter

• testing: switch from nose to py.test

• improve packaging. Include tests in sdist only

• numerous bug fixes

v0.3.1 (2015-10-04)

This release fixes a few bugs which caused crashes in some situations.

v0.3.0 (2015-09-22)

• support for requests-cache allowing to cache SDMX messages in memory, MongoDB, Redis or SQLite

• pythonic selection of series when requesting a dataset: Request.get allows the key keyword argument in a datarequest to be a dict mapping dimension names to values. In this case, the dataflow definition and datastructuredefinition, and content-constraint are downloaded on the fly, cached in memory and used to validate the keys.The dotted key string needed to construct the URL will be generated automatically.

• The Response.write method takes a parse_time keyword arg. Set it to False to avoid parsing of dates, timesand time periods as exotic formats may cause crashes.

• The Request.get method takes a memcache keyward argument. If set to a string, the received Response instancewill be stored in the dict Request.cache for later use. This is useful when, e.g., a DSD is needed multipletimes to validate keys.

• fixed base URL for Eurostat

• major refactorings to enhance code maintainability

5.1. What’s new? 13

pandaSDMX Documentation, Release 0.6.1

v0.2.2

• Make HTTP connections configurable by exposing the requests.get API through the pandasdmx.api.Request constructor. Hence, proxy servers, authorisation information and other HTTP-related parametersconsumed by requests.get can be specified for each Request instance and used in subsequent requests.The configuration is exposed as a dict through a new Request.client.config attribute.

• Responses have a new http_headers attribute containing the HTTP headers returned by the SDMX server

v0.2.1

• Request.get: allow fromfile to be a file-like object

• extract SDMX messages from zip archives if given. Important for large datasets from Eurostat

• automatically get a resource at an URL given in the footer of the received message. This allows to automaticallyget large datasets from Eurostat that have been made available at the given URL. The number of attempts andthe time to wait before each request are configurable via the get_footer_url argument.

v0.2 (2015-04-13)

This version is a quantum leap. The whole project has been redesigned and rewritten from scratch to provide robustsupport for many SDMX features. The new architecture is centered around a pythonic representation of the SDMXinformation model. It is extensible through readers and writers for alternative input and output formats. Export topandas has been dramatically improved. Sphinx documentation has been added.

v0.1 (2014-09)

Initial release

FAQ

Can pandaSDMX connect to SDMX providers other than INSEE, ECB and Eurostat?

Any SDMX provider can be supported that generates SDMX 2.1-compliant messages. INSEE, ECB and Eurostatare hard-coded. Others may be added in a few lines. Alternatively, a custom base URL can be provided to thepandasdmx.api.Request.get() method. See the docstring. Support for SDMX 2.0 messages could be addedas a new reader module. Perhaps the model would have to be tweaked a bit as well.

Writing large datasets to pandas DataFrames is slow. What can I do?

The main performance hit comes from parsing the time or time period strings. In case of regular data such as monthly(not trading day!), call the write method with fromfreq set to True so that only the first string will be parsedand the rest inferred from the frequency of the series. Caution: If the series is stored in the XML document in reversechronological order, the reverse_obs argument must be set to True as well to prevent the resulting dataframe indexfrom extending into a remote future.

14 Chapter 5. Table of contents

pandaSDMX Documentation, Release 0.6.1

Getting started

Installation

Prerequisites

pandaSDMX is a pure Python package. As such it should run on any platform. It requires Python 2.7, 3.4 or higher.

It is recommended to use one of the common Python distributions for scientific data analysis such as

• Anaconda, or

• Canopy.

Along with a current Python interpreter these Python distributions include lots of useful packages for data analysis.For other Python distributions (not only scientific) see here.

pandaSDMX has the following dependencies:

• the data analysis library pandas which itself depends on a number of packages

• the HTTP library requests

• LXML for XML processing.

• JSONPATH-RW for JSON processing.

Optional dependencies

• requests-cache allowing to cache SDMX messages in memory, MongoDB, Redis and more.

• odo for fancy data conversion and database export

• IPython is required to build the Sphinx documentation To do this, check out the pandaSDMX repository ongithub.

• py.test to run the test suite.

Download

From the command line of your OS, issue

• conda install -c alcibiade pandasdmx if you are using Anaconda,

• pip install pandasdmx otherwise.

Of course, you can also download the tarball from the PyPI and issue python setup.py install from thepackage dir.

Running the test suite

From the package directory, issue the folloing command:

>>> py.test

5.3. Getting started 15

pandaSDMX Documentation, Release 0.6.1

Package overview

Modules

api module containing the API to make queries to SDMX web services. See pandasdmx.api.Request in partic-ular its get method. pandasdmx.api.Request.get() return pandasdmx.api.Response instances.

model implements the SDMX information model.

remote contains a wrapper class around requests for http. Called by pandasdmx.api.Request.get() tomake http requests to SDMX services. Also reads sdmxml files instead of querying them over the web.

Subpackages

reader read SDMX files and instantiate the appropriate classes from pandasdmx.model There is only one readerfor XML-based SDMXML v2.1. Future versions may add reader modules for other formats.

writer contains writer classes transforming SDMX artefacts into other formats or writing them to arbitrary desti-nations such as databases. The only available writer for now writes generic datasets to pandas DataFrame orSeries.

utils: utility functions and classes. Contains a wrapper around dict allowing attribute access to dict items.

tests unit tests and sample files

What next?

The remaining chapters explain the key characteristics of SDMX, demonstrate the basic usage of pandaSDMX andprovide additional information on some advanced topics. While users that are new to SDMX are likely to benefit a lotfrom reading the next chapter on SDMX, normal use of pandaSDMX should not strictly require this. The Basic usagechapter should enable you to retrieve datasets and write them to pandas DataFrames. But if you want to exploit thefull richness of the information model, or simply feel more comfortable if you know what happens behind the scenes,the SDMX introduction is for you. It also contains links to authoratative SDMX resources.

A very short introduction to SDMX

Overall purpose

SDMX (short for: Statistical Data and Metadata eXchange) is a set of standards and guidelines aimed at facilitatingthe production, dissemination, retrieval and processing of statistical data and metadata. SDMX is sponsored by a widerange of public institutions including the UN, the IMF, the Worldbank, BIS, ILO, FAO, the OECD, the ECB, Eurostat,and a number of national statistics offices. These and other institutions provide a vast array of current and historicdatasets and metadatasets via free or fee-based REST and SOAP web services. pandaSDMX only supports SDMXv2.1, that is, the latest version of this standard. Some agencies such as the IMF continue to offer SDMX 2.0-compliantservices. These cannot be accessed by pandaSDMX. While this may change in future versions, there is the expectationthat SDMX providers will upgrade to the latest standards at some point.

Information model

At its core, SDMX defines an information model consisting of a set of classes, their logical relations, and semantics.There are classes defining things like datasets, metadatasets, data and metadata structures, processes, organisations

16 Chapter 5. Table of contents

pandaSDMX Documentation, Release 0.6.1

and their specific roles to name but a few. The information model is agnostic as to its implementation. Luckily, theSDMX standard provides an XML-based implementation (see below). And a more efficient JSON-variant is beingstandardised by the SDMX Technical Standards Working Group.

The following sections briefly introduces some key elements of the information model.

Datasets

a dataset can broadly be described as a container of ordered observations and attributes attached to them. Observations(e.g. the annual unemployment rate) are classified by dimensions such as country, age, sex, and time period. Attributesmay further describe an individual observation or a set of observations. Typical uses for attributes are the level ofconfidentiality, or data quality. Observations may be clustered into series, in particular, time series. The dataset mustexplicitly specify the dimension at observation such as ‘time’, ‘time_period’ or anything else. If a dataset consistsof series whose dimension at observation is neither time nor time period, the dataset is called cross-sectional. Adataset that is not grouped into series, i.e. where all dimension values including time, if available, are stated for eachobservation, are called flat datasets. These are not memory-efficient, but benefit from a very simple representation.

An attribute may be attached to a series to express the fact that it applies to all contained observations. This increasesefficiency and adds meaning. Subsets of series within a dataset may be clustered into groups. A group is defined byspecifying one or more dimension values, but not all: At least the dimension at observation and one other dimensionmust remain free (or wild-carded). Otherwise, the group would in fact be either a single observation or a series. Themain purpose of group is to serve as another attachment point for attributes. Hence, a given attributes may be attachedto all series within the group at once. Attributes may finally be attached to the entire dataset, i.e. to all observationstherein.

Structural metadata: data structure definition, concept scheme and code list

In the above section on datasets, we have carelessly used structural terms such as dimension, dimension value andattachment of attributes. This is because it is almost impossible to talk about datasets without talking about theirstructure. The information model provides a number of classes to describe the structure of datasets without talkingabout data. The container class for this is called DataStructureDefinition (in short: DSD). It contains a list of dimen-sions and for each dimension a reference to exactly one concept describing its meaing. A concept describes the set ofpermissible dimension values. This can be done in various ways depending on the intended data type. Finite value sets(such as country codes, currencies, a data quality classification etc.) are described by reference to code lists. Infinitevalue sets are described by facets which is simply a way to express that a dimension may have int, float or time-stampvalues, to name but a few. A set of concepts referred to in the dimension descriptors of a data structure definition iscalled concept scheme.

The set of allowed observation values such as the unemployment rate measured in per cent is defined by a specialdimension: the MeasureDimension, thus enabling the validation of any observation value against its DSD.

Dataflow definition

A dataflow describes what a particular dataset is about, how often it is updated over time by its maintaining agency,under what conditions it will be provided etc. The terminology is a bit confusing: You cannot actually obtain a dataflowfrom an SDMX web service. Rather, you can request one or more dataflow definitions describing a flow of data overtime. The dataflow definition and the artefacts to which it refers give you all the information you need to exploit thedatasets you can request using the dataflow’s ID.

A DataFlowDefinition is a class that describes a dataflow. A DataFlowDefinition has a unique identifier, a human-readable name and potentially a more detailed description. Both may be multi-lingual. The dataflow’s ID is used toquery the dataset it describes. The dataflow also features a reference to the DSD which structures the datasets availableunder this dataflow ID. For instance, in the frontpage example we used the dataflow ID ‘une_rt_a’.

5.4. A very short introduction to SDMX 17

pandaSDMX Documentation, Release 0.6.1

Constraints

There are two types of constraints:

A content-constraint is a mechanism to express the fact that datasets of a given dataflow only comprise columns fora subset of values from the code-lists representing dimension values. For example, the datastructure definition for adataflow on exchange rates references tha codelist of all country codes in the world, whereas the datasets providedunder this dataflow only covers the ten largest currencies. These can be enumerated by a content-constraint attachedto the dataflow definition. Content-constraints can be used to validate dimension names and values (a.k.a. keys) whenrequesting datasets selecting columns of interest.

An attachment-constraint describes to which parts of a dataset (column/series, group of series, observation, the entiredataset) certain attributes may be attached. Attachment-constraints are not supported by pandaSDMX as this featureis needed only for dataset generation. However, pandaSDMX does support attributes in the information model andwhen exporting datasets to pandas.

Category schemes and categorisations

Categories serve to classify or categorise things like dataflows, e.g., by subject matter. Multiple categories may belongto a container called CategorySchemes.

A Categorisation links the thing to be categorised, e.g., a DataFlowDefinition, to a Category.

Class hierarchy

The SDMX information model defines a number of base classes from which concrete classes such as DataFlowDefi-nition or DataStructureDefinition inherit. E.g., DataFlowDefinition inherits from MaintainableArtefact attributes indi-cating the maintaining agency. MaintainableArtefact inherits from VersionableArtefact, which, in turn, inherits fromIdentifiableArtefact which inherits from AnnotableArtefact and so forth. Hence, DataStructureDefinition may havea unique ID, a version, a natural language name in multiple languages, a description, and annotations. pandaSDMXtakes advantage from this class hierarchy.

Implementations of the information model

Background

There are two implementations of the information model:

• SDMXML is XML-based. It is fully standardised and covers the complete information model. However, it is abit heavy-weight and data providers are gradually shifting to the JSON flavor currently in the works.

• SDMXJSON: This recent JSON-based implementation is more lightweight and efficient. While standardisa-tion is in an advanced stage, structure-messages are not yet covered. Data messages work well hough, andpandaSDMX supports them as from v0.5.

SDMXML

The SDMX standard defines an XML-based implementation of the information model called SDMXML. An SD-MXML document contains exactly one SDMX Message. There are several types of Message such as GenericDataMes-sage to represent a DataSet in generic form, i.e. containing all the information required to interpret it. Hence, datasetsin generic representation may be used without knowing the related DataStructureDefinition. The downside is thatgeneric dataset messages are much larger than their sister format StructureSpecificDataSet. pandaSDMX as of v0.2only supports generic dataset messages.

18 Chapter 5. Table of contents

pandaSDMX Documentation, Release 0.6.1

Another important SDMXML message type is StructureMessage which may contain artefacts such as DataStructure-Definitions, codelists, conceptschemes, categoryschemes and so forth.

SDMXML provides that each message contains a Header containing some metadata about the message. Finally,SDMXML messages may contain a Footer element. It provides information on any errors that have occurred on theserver side, e.g., if the requested dataset exceeds the size limit, or the server needs some time to make it availableunder a given link.

The test suite comes with a number of small SDMXML demo files. View them in your favorite XML editor to get adeeper understanding of the structure and content of various message types.

SDMX services provide XML schemas to validate a particular SDMXML file. However, pandaSDMX does not yetsupport validation.

SDMXJSON

SDMXJSON represents SDMX datasets and related metadata as JSON files provided by RESTful web services. Earlyadopters of this format are OECD, ECB and IMF. As of v0.5, pandaSDMX supports the OECD’s REST interface forSDMXJSON. However, note that structural metadata is not yet fully standardised. Hence, it is impossible at this stageto download dataflow definitions, codelists etc. from OECD.

SDMX web services

The SDMX standard defines both a REST and a SOAP web service API. As of v0.5, pandaSDMX only supports theREST API.

The URL specifies the type, providing agency, and ID of the requested SDMX resource (dataflow, categoryscheme,data etc.). The query part of the URL (after the ‘?’) may be used to give optional query parameters. For instance, whenrequesting data, the scope of the dataset may be narrowed down by specifying a key to select only matching columns(e.g. on a particular country). The dimension names and values used to select the rows can be validated by checkingif they are contained in the relevant codelists referenced by the datastructure definition (see above), and any content-constraint attached to the dataflow definition for the queried dataset. Moreover, rows may be chosen by specifying astartperiod and endperiod for the time series. In addition, the query part may set a references parameter to instructthe SDMX server to return a number of other artefacts along with the resource actually requested. For example, aDataStructureDefinition contains references to codelists and conceptschemes (see above). If the ‘references’ parameteris set to ‘all’, these will be returned in the same StructureMessage. The next chapter contains some examples todemonstrate this mechanism. Further details can be found in the SDMX User Guide, and the Web Service Guidelines.

Further reading

• The SDMX standards and guidelines are the authoritative resource. This page is a must for anyone eager to divedeeper into SDMX. Start with the User Guide and the Information Model (Part 2 of the standard). The WebServices Guidelines contain instructive examples for typical queries.

• Eurostat SDMX page

• European Central Bank SDMX page It links to a range of study guides and helpful video tutorials.

• SDMXSource: - Java, .NET and ActionScript implementations of SDMX software, in part open source

5.4. A very short introduction to SDMX 19

pandaSDMX Documentation, Release 0.6.1

Basic usage

Overview

This chapter illustrates the main steps of a typical workflow, namely:

1. retrieving relevant dataflows by category or from a complete list of dataflows,

2. exploring the data structure, related code lists, and other metadata by exporting them as pandas DataFrames

3. selecting relevant series (columns) and a time-range (rows) from a dataset provided under the chosen dataflowand requesting data sets via http

4. exploring the received data using the information model

5. writing a dataset or selected series thereof to a pandas DataFrame or Series

6. Reading and writing SDMX files

7. odo support

8. Handling errors

These steps share common tasks which flow from the architecture of pandaSDMX:

1. Use a new or existing pandasdmx.api.Request instance to get an SDMX message from a web service orfile and load it into memory. Since version 0.4 this can be conveniently done by descriptors named after the webresources defined by the SDMX standard (dataflow, categoryscheme, data etc.). In older versions,these operations required to call pandasdmx.api.Request.get()

2. Explore the returned pandasdmx.api.Response instance

• check for errors

• explore the SDMX message’s content .

• write data or metadata to a pandas DataFrame or Series by Calling pandasdmx.api.Response.write().

Importing pandaSDMX

As explained in the preceeding section, we will need pandasdmx.api.Request all the time. Yet, wecan use the following shortcut to import it:

In [1]: from pandasdmx import Request

Connecting to an SDMX web service, caching

We instantiate pandasdmx.api.Request. The constructor accepts an optional agency ID as string. The list ofsupported agencies (as of version 0.4: ECB, ESTAT, INSEE) is shown in the error message if an invalid agency ID ispassed:

In [2]: ecb = Request('ECB')

ecb is now configured so as to make requests to the European Central Bank. If you want to send requests to otheragencies, you can instantiate multiple Request objects.

20 Chapter 5. Table of contents

pandaSDMX Documentation, Release 0.6.1

Configuring the http connection

To pre-configure the HTTP connections to be established by a Request instance, you can pass all keyword argumentsconsumed by the underlying HTTP library requests (new in version 0.2.2). For a complete description of the optionssee the requests documentation. For example, a proxy server can be specified for subsequent requests like so:

In [3]: ecb_via_proxy = Request('ECB', proxies={'http': 'http://1.2.3.4:5678'})

HTTP request parameters are exposed through a dict. It may be modified between requests.

In [4]: ecb_via_proxy.client.configOut[4]: {'proxies': {'http': 'http://1.2.3.4:5678'}, 'stream': True, 'timeout': 30.1}

The Request.client attribute acts a bit like a requests.Session in that it conveniently stores the configu-ration for subsequent HTTP requests. Modify it to change the configuration. For convenience, pandasdmx.api.Request has a timeout property to set the timeout in seconds for http requests.

Caching received files

Since version 0.3.0, requests-cache is supported. To use it, pass an optional cache keyword argument to Request()constructor. If given, it must be a dict whose items will be passed to requests_cache.install_cache func-tion. Use it if you want to cache SDMX messages in databases such as MongoDB, Redis or SQLite. See the requests-cache‘ docs for further information.

Loading a file instead of requesting it via http

Any Request instance can load SDMX messages from local files. Issuing r = Request() without passing anyagency ID instantiates a Request object not tied to any agency. It may only be used to load SDMX messages fromfiles, unless a pre-fabricated URL is passed to pandasdmx.api.Request.get().

Obtaining and exploring metadata about datasets

This section illustrates by a typical use case how to download and explore metadata. Assume we are looking fortime-series on exchange rates. Our best guess is that the European Central Bank provides a relevant dataflow. Wecould google for the dataflow ID or browse through the ECB’s website. However, we choose to use SDMX metadata,namely category-schemes to get a complete overview of the dataflows the ECB provides.

Note: Some data providers such as the ECB and INSEE, but not Eurostat, support category-schemes to facilitatedataflow retrieval. If you already know, e.g., from the data provider’s website or other publications, what dataflowsyou are looking for, you won’t need this step. Yet this section should still be useful as it demonstrates how metadatacan be explored using pandas DataFrames.

Getting the category scheme

SDMX allows to download a list of dataflow definitions for all dataflows provided by a given data provider. As theselists may be very long, SDMX supports category-schemes to categorize dataflow definitions and other objects. Notethat the terms ‘dataflow’ and ‘dataflow definition’ are used synonymously.

To search the list of dataflows by category, we request the category scheme from the ECB’s SDMX service and explorethe response like so:

5.5. Basic usage 21

pandaSDMX Documentation, Release 0.6.1

In [5]: cat_response = ecb.categoryscheme()

The content of the SDMX message, its header and its payload are exposed as attributes. These are also accessibledirectly from the containing pandasdmx.api.Response instance (new in version 0.4). We will use this shortcutthroughout this documentation. But keep in mind that all payload such as data or metadata is stored as attributes of apandasdmx.model.Message instance which can be explicitly accessed from a Response instance via its msgattribute.

Try dir(cat_response.msg) to see what we have received: There is not only the category scheme, but also thedataflows and categorisations. This is because the get method has conveniently set the references parameter to adefault value. We can see this from the URL:

In [6]: cat_response.urlOut[6]: 'http://sdw-wsrest.ecb.int/service/categoryscheme?→˓references=parentsandsiblings'

The HTTP headers returned by the SDMX server are availble as well (new in version 0.2.2):

In [7]: cat_response.http_headersOut[7]: {'Content-Encoding': 'gzip', 'Connection': 'keep-alive', 'Date': 'Thu, 16 Mar→˓2017 20:43:08 GMT', 'Expires': 'Thu, 16 Mar 2017 20:43:08 GMT', 'Server': 'Apache-→˓Coyote/1.1', 'Content-Length': '19837', 'Vary': 'Accept, Accept-Encoding', 'Pragma→˓': 'no-cache', 'Cache-Control': 'max-age=0, no-cache, no-store', 'Content-Type':→˓'application/xml', 'X-N': 'S'}

Now let’s export our category scheme to a pandas DataFrame and see what’s in there:

In [8]: cat_response.write().categoryschemeOut[8]:

nameMOBILE_NAVI MOBILE_NAVI Economic concepts

01 Monetary operations02 Prices, output, demand and labour market03 Monetary and financial statistics04 Euro area accounts05 Government finance06 External transactions and positions07 Exchange rates08 Payments and securities trading, clearing, set...09 Banknotes and Coins10 Indicators of Financial Integration11 Real Time Database (research database)

MOBILE_BASKETS MOBILE_BASKETS Predefined data baskets01 Key euro area indicators02 Exchange rates03 Monetary aggregates04 Bank interest rates05 Prices06 Government finance

MOBILE_NAVI_PUB MOBILE_NAVI_PUB Published indicators01 Monetary operations02 Prices, output, demand and labour market03 Monetary and financial statistics04 Euro area accounts05 Government finance06 External transactions and positions07 Exchange rates08 Payments and securities trading, clearing, set...

22 Chapter 5. Table of contents

pandaSDMX Documentation, Release 0.6.1

09 Banknotes and CoinsJDF_NAVI JDF_NAVI Selected euro area statistics and national bre...

01 Monetary statistics02 Securities issues03 Investment funds04 Insurance corporations and pension funds05 Average interest rates on deposits and loans o...06 Competitiveness indicators07 International reserves08 Harmonised indices of consumer prices09 GDP and expenditure components10 Payment and terminal transactions involving no...11 Number of MFIs per country of residence

The pandasdmx.api.Response.write() returns a mapping from the metadata contained in thepandasdmx.model.StructureMessage instance to pandas DataFrames. E.g., there is a key and correspond-ing DataFrame for the resource categoryscheme. The mapping object is a thin wrapper around dict whichessentially enables attribute syntax for read access.

The write-method accepts a number of keyword arguments to choose the resources to be exported, the attributes tobe included in the DataFrame columns, and the desired language. See the doc string for details.

There are three category-schemes. As we are interested in exchange rate data, we will have a closer look at category‘07’ of category-scheme ‘MOBILE_NAVI’.

Extracting the dataflows in a particular category

To display the categorised items, in our case the dataflow definitions contained in the category on exchange rates, weiterate over the Category instance (new in version 0.5):

In [9]: list(cat_response.categoryscheme.MOBILE_NAVI['07'])Out[9]:[DataflowDefinition | EXR | Exchange Rates,DataflowDefinition | WTS | Trade weights]

Retrieving dataflows without using categories

In the previous section we have used categories to find relevant dataflows. However, in many situations there areno categories to narrow down the result set. We can export the dataflow definitions to a pandas DataFrame and usepandas’ text search capabilities to find dataflows of interest:

In [10]: cat_response.write().dataflow.head()Out[10]:

namedataflowAME AMECOBKN Banknotes statisticsBKN_PUB Banknotes statistics - Published seriesBLS Bank Lending Survey StatisticsBOP Euro Area Balance of Payments and Internationa...

Moreover, the old pandasdmx.utils.DictLike.find() is still available.

5.5. Basic usage 23

pandaSDMX Documentation, Release 0.6.1

Extracting the data structure and data from a dataflow

In this section we will focus on a particular dataflow. We will use the ‘EXR’ dataflow from the European CentralBank. In the previous section we already obtained the dataflow definitions by requesting the categoryschemes withthe appropriate references. But this works only if the SDMX services supports category schemes. If not (and manyagencies don’t), we need to download the dataflow definitions explicitly by issuing:

>>> flows = ecb.dataflow()

Dataflow definitions at a glance

A pandasdmx.model.DataFlowDefinition has an id , name , version and many other attributes inher-ited from various base classes. It is worthwhile to look at the method resolution order to see how it works. Many otherclasses from the model have similar base classes.

It is crucial to bear in mind two things:

• the id of a dataflow definition is also used to request data of this dataflow.

• the structure attribute of the dataflow definition. is a reference to the data structure definition describingdatasets of this dataflow.

Getting the data structure definition (DSD)

We can extract the DSD’s ID from the dataflow definition and download the DSD together with all artefacts that itrefers to and that refer to it. We set the params keyword argument explicitly to the default value to show how itworks.

In [11]: dsd_id = cat_response.dataflow.EXR.structure.id

In [12]: dsd_idOut[12]: 'ECB_EXR1'

In [13]: refs = dict(references = 'all')

In [14]: dsd_response = ecb.datastructure(resource_id = dsd_id, params = refs)

In [15]: dsd = dsd_response.datastructure[dsd_id]

A DSD essentially defines three things:

• the dimensions of the datasets of this dataflow, i.e. the order and names of the dimensions and the allowed valuesor the data type for each dimension, and

• the attributes, i.e. their names, allowed values and where each may be attached. There are four possible attach-ment points:

– at the individual observation

– at series level

– at group level (i.e. a subset of series defined by dimension values)

– at dataset level.

• the measures

Let’s look at the dimensions and for the ‘CURRENCY’ dimension also at the allowed values as enumerated in thereferenced code list:

24 Chapter 5. Table of contents

pandaSDMX Documentation, Release 0.6.1

In [16]: dsd.dimensions.aslist()Out[16]:[Dimension | FREQ,Dimension | CURRENCY,Dimension | CURRENCY_DENOM,Dimension | EXR_TYPE,Dimension | EXR_SUFFIX,TimeDimension | TIME_PERIOD]

In [17]: dsd_response.write().codelist.loc['CURRENCY'].head()\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\Out[17]:→˓

dim_or_attr nameCURRENCY D Currency code listADF D Andorran Franc (1-1 peg to the French franc)ADP D Andorran Peseta (1-1 peg to the Spanish peseta)AED D United Arab Emirates dirhamAFA D Afghanistan afghani (old)

The order of dimensions will determine the order of column index levels of the pandas DataFrame (see below).

The DataFrame representation of the code list for the CURRENCY dimension shows that ‘USD’ and ‘JPY’ are validdimension values. We need this information to construct a filter for our dataset query which should be limited to thecurrencies we are interested in.

Note that pandasdmx.model.Scheme.aslist() sorts the dimension objects by their position attribute. Theorder matters when constructing filters for dataset queries (see below). But pandaSDMX sorts filter values behind thescenes, so we need not care.

Attribute names and allowed values can be obtained in a similar fashion.

Note: Groups are not yet implemented in the DSD. But this is not a problem as they are implemented for genericdatasets. Thus, datasets should be rendered properly including all attributes and their attachment levels.

Working with datasets

Selecting and requesting data from a dataflow

Requesting a dataset is as easy as requesting a dataflow definition or any other SDMX artefact: Just call thepandasdmx.api.Request.get() method and pass it ‘data’ as the resource_type and the dataflow ID as re-source_id. Alternatively, you can use the data descriptor which calls the get method implicitly.

However, we only want to download those parts of the data we are interested in. Not only does this increase perfor-mance. Rather, some dataflows are really huge, and would exceed the server or client limits. The REST API of SDMXoffers two ways to narrow down a data request:

• specifying dimension values which the series to be returned must match (“horizontal filter”) or

• limiting the time range or number of observations per series (“vertical filter”)

From the ECB’s dataflow on exchange rates, we specify the CURRENCY dimension to be either ‘USD’ or ‘JPY’.This can be done by passing a key keyword argument to the get method or the data descriptor. It may either bea string (low-level API) or a dict. The dict form introduced in v0.3.0 is more convenient and pythonic as it allowspandaSDMX to infer the string form from the dict. Its keys (= dimension names) and values (= dimension values) willbe validated against the datastructure definition as well as the content-constraint if available.

5.5. Basic usage 25

pandaSDMX Documentation, Release 0.6.1

Content-constraints are implemented only in their CubeRegion flavor. KeyValueSets are not yet supported. In thiscase, the provided demension values will be validated only against the code-list. It is thus not always guaranteed thatthe dataset actually contains the desired data, e.g., because the country of interest does not deliver the data to theSDMX data provider.

If we choose the string form of the key, it must consist of ‘.’-separated slots representing the dimensions. Values areoptional. As we saw in the previous section, the ECB’s dataflow for exchange rates has five relevant dimensions, the‘CURRENCY’ dimension being at position two. This yields the key ‘.USD+JPY...’. The ‘+’ can be read as an ‘OR’operator. The dict form is shown below.

Further, we will set the start period for the time series to 2014 to exclude any prior data from the request.

In [18]: data_response = ecb.data(resource_id = 'EXR', key={'CURRENCY': 'USD+JPY'},→˓params = {'startPeriod': '2016'})

In [19]: data = data_response.data

In [20]: type(data)Out[20]: pandasdmx.model.DataSet

Datasets

This section explains the key elements and structure of data sets. You can skip it on first read when you just want tobe able to download data and export it to pandas. More advanced operations, e.g., exporting only a subset of series topandas, requires some understanding of the anatomy of a data set including observations and attributes.

As we saw in the previous section, the datastructure definition (DSD) is crucial to understanding the data structure,the meaning of dimension and attribute values, and to select series of interest from the entire data set by specifying avalid key.

The pandasdmx.model.DataSet class has the following features:

dim_at_obs attribute showing which dimension is at observation level. For time series its value is either ‘TIME’or ‘TIME_PERIOD’. If it is ‘AllDimensions’, the dataset is said to be flat. In this case there are no series, just aflat list of observations.

series property returning an iterator over pandasdmx.model.Series instances

obs method returning an iterator over the observations. Only for flat datasets.

attributes namedtuple of attributes, if any, that are attached at dataset level

The pandasdmx.model.Series class has the following features:

key nnamedtuple mapping dimension names to dimension values

obs method returning an iterator over observations within the series

attributes: namedtuple mapping any attribute names to values

groups list of pandasdmx.model.Group instances to which this series belongs. Note that groups are merelyattachment points for attributes.

In [21]: data.dim_at_obsOut[21]: 'TIME_PERIOD'

In [22]: series_l = list(data.series)

In [23]: len(series_l)Out[23]: 16

26 Chapter 5. Table of contents

pandaSDMX Documentation, Release 0.6.1

In [24]: series_l[5].key\\\\\\\\\\\\Out[24]: SeriesKey(FREQ='D', CURRENCY='USD', CURRENCY_DENOM='EUR', EXR_→˓TYPE='SP00', EXR_SUFFIX='A')

In [25]: set(s.key.FREQ for s in data.series)\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\Out[25]:→˓{'A', 'D', 'H', 'M', 'Q'}

This dataset thus comprises 16 time series of several different period lengths. We could have chosen to request onlydaily data in the first place by providing the value D for the FREQ dimension. In the next section we will show howcolumns from a dataset can be selected through the information model when writing to a pandas DataFrame.

Writing to pandas

Selecting columns using the model API

As we want to write data to a pandas DataFrame rather than an iterator of pandas Series, we must not mix up the timespans. Therefore, we single out the daily data first. The pandasdmx.api.Response.write() method acceptsan optional iterable to select a subset of the series contained in the dataset. Thus we can now generate our pandasDataFrame from daily exchange rate data only:

In [26]: daily = (s for s in data.series if s.key.FREQ == 'D')

In [27]: cur_df = data_response.write(daily)

In [28]: cur_df.shapeOut[28]: (311, 2)

In [29]: cur_df.tail()\\\\\\\\\\\\\\\\\\Out[29]:FREQ DCURRENCY JPY USDCURRENCY_DENOM EUR EUREXR_TYPE SP00 SP00EXR_SUFFIX A ATIME_PERIOD2017-03-10 122.42 1.06062017-03-13 122.35 1.06632017-03-14 122.13 1.06312017-03-15 121.77 1.06222017-03-16 121.55 1.0726

Controlling the output

The docstring of the pandasdmx.writer.data2pandas.Writer.write() method explains a number ofoptional arguments to control whether or not another dataframe should be generated for the attributes, which attributesit should contain, and, most importantly, if the resulting pandas Series should be concatenated to a single DataFrameat all (asframe = True is the default).

5.5. Basic usage 27

pandaSDMX Documentation, Release 0.6.1

Controlling index generation

The write method provides the following parameters to control index generation. This is useful to increase per-formance for large datasets with regular indexes (e.g. monthly data, and to avoid crashes caused by exotic datetimeformats not parsed by pandas:

• fromfreq: if True, the index will be extrapolated from the first date or period and the frequency. This is onlyrobust if the dataset has a uniform index, e.g. has no gaps like for daily trading data.

• reverse_obs:: if True, return observations in a series in reverse document order. This may be useful toestablish chronological order, in particular incombination with fromfreq. Default is False.

• If pandas raises parsing errors due to exotic date-time formats, set parse_time to False to obtain a stringindex rather than datetime index. Default is True.

Working with files

The pandasdmx.api.Request.get() method accepts two optional keyword arguments tofile andfromfile. If a file path or, in case of fromfile, a file-like object is given, any SDMX message received from theserver will be written to a file, or a file will be read instead of making a request to a remote server.

The file to be read may be a zip file (new in version 0.2.1). In this case, the SDMX message must be the first file in thearchive. The same works for zip files returned from an SDMX server. This happens, e.g., when Eurostat finds that therequested dataset has been too large. In this case the first request will yield a message with a footer containing a linkto a zip file to be made available after some time. The link may be extracted by issuing something like:

>>> resp.footer.text[1]

and passed as url argument when calling get a second time to get the zipped data message.

Since version 0.2.1, this second request can be performed automatically through the get_footer_url parameter.It defaults to (30, 3) which means that three attempts will be made in 30 seconds intervals. This behavior is usefulwhen requesting large datasets from Eurostat. Deactivate it by setting get_footer_url to None.

In addition, since version 0.4 you can use pandasdmx.api.Response.write_source() to save the serializedXML tree to a file.

Caching Response instances in memory

The ‘’get” API provides a rudimentary cache for Response instances. It is a simple dict mapping user-provided namesto the Response instances. If we want to cache a Response, we can provide a suitable name by passing the keywordargument memcache to the get method. Pre-existing items under the same key will be overwritten.

Note: Caching of http responses can also be achieved through ‘’requests-cache’. Activate the cache by instantiatingpandasdmx.api.Request passing a keyword argument cache. It must be a dict mapping config and othervalues.

Using odo to export data sets to other data formats and database backends

Since version 0.4, pandaSDMX supports odo, a great tool to convert data sets to a variety of data formats and databasebackends. To use this feature, you have to call pandasdmx.odo_register() to register .sdmx files with odo.Then you can convert an .sdmx file containing a data set to, say, a CSV file or an SQLite or PostgreSQL database in afew lines:

28 Chapter 5. Table of contents

pandaSDMX Documentation, Release 0.6.1

>>> import pandasdmx>>> from odo import odo___ pandasdmx.odo_register()>>> odo('mydata.sdmx', 'sqlite:///mydata.sqlite')

Behind the scenes, odo uses pandaSDMX to convert the .sdmx file to a pandas DataFrame and performs any furtherconversions from there based on odo’s conversion graph. Any keyword arguments passed to odo will be passed on topandasdmx.api.Response.write().

There is a limitation though: In the exchange rate example from the previous chapter, we needed to select same-frequency series from the data set before converting the data set to pandas. This will likely cause crashes as odo’sdiscover method is unaware of this selection. Hence, .sdmx files can only be exported using odo if they can beexported to pandas without passing any arguments to pandasdmx.api.Response.write().

Handling errors

The pandasdmx.api.Response instance generated upon receipt of the response from the server has astatus_code attribute. The SDMX web services guidelines explain the meaning of these codes. In addition, ifthe SDMX server has encountered an error, it may return a message which includes a footer containing explanatorynotes. pandaSDMX exposes the content of a footer via a text attribute which is a list of strings.

Note: pandaSDMX raises only http errors with status code between 400 and 499. Codes >= 500 do not raise an erroras the SDMX web services guidelines define special meanings to those codes. The caller must therefore raise an errorif needed.

Logging

Since version 0.4, pandaSDMX can log certain events such as when a connection to a web service is made or a file hasbeen successfully downloaded. It uses the logging package from the Python stdlib. . To activate logging, you must setthe parent logger’s level to the desired value as described in the logging docs. Example:

>>> pandasdmx.logger.setLevel(10)

Dealign with agencies (data providers)

Overview

tbc

Advanced topics

The information model in detail

The easiest way to understanding the class hierarchy of the information model is to download a DSD from a dataprovider and inspect the resulting objects’ base classes and MRO.

5.6. Dealign with agencies (data providers) 29

pandaSDMX Documentation, Release 0.6.1

In most situations, structure metadata is represented by subclasses of dict mapping the SDMX artifacts’ ID’s to theartefacts themselves. The most intuitive examples are the container of code lists and the codes within a code list.

Likewise, categorisations, categoryschemes, and many other artefacts from the SDMX information model are repre-sented by subclasses of dict.

If dict keys are valid attribute names, you can use attribute syntax. This is thanks to pandasdmx.utils.DictLike, a thin wrapper around dict that internally uses a patched third-party tool.

In particular, the categoryscheme attribute of a pandasdmx.model.StructureMessage`instance isan instance of ``DictLike`. The DictLike `` container for the received categoryschemes uses the ``ID attribute of pandasdmx.model.CategoryScheme as keys. This level of gener-ality is required to cater for situations in which more than one category scheme is returned.

Note that pandasdmx.model.DictLike has a ‘‘ aslist‘‘ method. It returns its values as a new list sorted byid. The sorting criterion may be overridden in subclasses. We can see this when dealing with dimensions in apandasdmx.model.DataStructureDefinition where the dimensions are ordered by position.

Accessing the underlying XML document

The information model does not (yet) expose all attributes of SDMX messages. However, the underlying XML ele-ments are accessible from almost everywhere. This is thanks to the base class pandasdmx.model.SDMXObject.It injects two attributes: _elem and _reader which grant access to the XML element represented by the modelclass instance as well as the reader instance.

Extending pandaSDMX

pandaSDMX is now extensible by readers and writers. While the API needs a few refinements, it should be straightfor-ward to depart from pandasdmx.writer.data2pandas to develop writers for alternative output formats suchas spreadsheet, database, or web applications.

Similarly, a reader for the upcoming JSON-based SDMX format would be useful.

Interested developers should contact the author at [email protected].

pandasdmx

pandasdmx package

Subpackages

pandasdmx.reader package

Submodules

pandasdmx.reader.sdmxjson module

This module contains a reader for SDMXML v2.1.

class pandasdmx.reader.sdmxjson.Reader(request, **kwargs)Bases: pandasdmx.reader.BaseReader

Read SDMXJSON 2.1 and expose it as instances from pandasdmx.model

30 Chapter 5. Table of contents

pandaSDMX Documentation, Release 0.6.1

dataset_attrib(sdmxobj)

dim_at_obs(sdmxobj)

generic_groups(sdmxobj)

generic_series(sdmxobj)

getitem0 = operator.itemgetter(0)

getitem_key = operator.itemgetter(‘_key’)

group_key(sdmxobj)

header_error(sdmxobj)

initialize(source)

international_str(name, sdmxobj)return DictLike of xml:lang attributes. If node has no attributes, assume that language is ‘en’.

iter_generic_obs(sdmxobj, with_value, with_attributes)

iter_generic_series_obs(sdmxobj, with_value, with_attributes, reverse_obs=False)

read_as_str(name, sdmxobj, first_only=True)

series_attrib(sdmxobj)

series_key(sdmxobj)

structured_by(sdmxobj)

write_source(filename)Save source to file by calling write on the root element.

class pandasdmx.reader.sdmxjson.XPath(path)Bases: object

pandasdmx.reader.sdmxml module

This module contains a reader for SDMXML v2.1.

class pandasdmx.reader.sdmxml.Reader(request, **kwargs)Bases: pandasdmx.reader.BaseReader

Read SDMX-ML 2.1 and expose it as instances from pandasdmx.model

dataset_attrib(sdmxobj)

dim_at_obs(sdmxobj)

generic_groups(sdmxobj)

generic_series(sdmxobj)

group_key(sdmxobj)

header_error(sdmxobj)

initialize(source)

international_str(name, sdmxobj)return DictLike of xml:lang attributes. If node has no attributes, assume that language is ‘en’.

iter_generic_obs(sdmxobj, with_value, with_attributes)

5.8. pandasdmx 31

pandaSDMX Documentation, Release 0.6.1

iter_generic_series_obs(sdmxobj, with_value, with_attributes, reverse_obs=False)

series_attrib(sdmxobj)

series_key(sdmxobj)

structured_by(sdmxobj)

write_source(filename)Save XML source to file by calling write on the root element.

Module contents

This module contains the base class for readers.

class pandasdmx.reader.BaseReader(request, **kwargs)Bases: object

initialize(source)

read_as_str(name, sdmxobj, first_only=True)

read_identifiables(cls, sdmxobj, offset=None)If sdmxobj inherits from dict: update it with modelized elements. These must be instances ofmodel.IdentifiableArtefact, i.e. have an ‘id’ attribute. This will be used as dict keys. If sdmxobj doesnot inherit from dict: return a new DictLike.

read_instance(cls, sdmxobj, offset=None, first_only=True)If cls in _paths and matches, return an instance of cls with the first XML element, or, if first_only is False,a list of cls instances for all elements found, If no matches were found, return None.

pandasdmx.utils package

Submodules

pandasdmx.utils.aadict module

class pandasdmx.utils.aadict.aadictBases: dict

A dict subclass that allows attribute access to be synonymous with item access, e.g. mydict.attribute== mydict['attribute']. It also provides several other useful helper methods, such as pick() andomit().

static d2a(subject)

static d2ar(subject)

omit(*args)

pick(*args)

update(*args, **kw)

32 Chapter 5. Table of contents

pandaSDMX Documentation, Release 0.6.1

pandasdmx.utils.anynamedtuple module

pandasdmx.utils.anynamedtuple.namedtuple(typename, field_names, verbose=False, re-name=False)

Returns a new subclass of tuple with named fields. This is a patched version of collections.namedtuple from thestdlib. Unlike the latter, it accepts non-identifier strings as field names. All values are accessible through dictsyntax. Fields whose names are identifiers are also accessible via attribute syntax as in ordinary namedtuples,alongside traditional indexing. This feature is needed as SDMX allows field names to contain ‘-‘.

>>> Point = namedtuple('Point', ['x', 'y'])>>> Point.__doc__ # docstring for the new class'Point(x, y)'>>> p = Point(11, y=22) # instantiate with positional args or keywords>>> p[0] + p[1] # indexable like a plain tuple33>>> x, y = p # unpack like a regular tuple>>> x, y(11, 22)>>> p.x + p.y # fields also accessable by name33>>> d = p._asdict() # convert to a dictionary>>> d['x']11>>> Point(**d) # convert from a dictionaryPoint(x=11, y=22)>>> p._replace(x=100) # _replace() is like str.replace() but→˓targets named fieldsPoint(x=100, y=22)

Module contents

module pandasdmx.utils - helper classes and functions

class pandasdmx.utils.DictLikeBases: pandasdmx.utils.aadict.aadict

Thin wrapper around dict type

It allows attribute-like item access, has a find() method and inherits other useful features from aadict.

any()return an arbitrary or the only value. If dict is empty, raise KeyError.

aslist()return values() as unordered list

find(search_str, by=’name’, language=’en’)Select values by attribute

Parameters

• searchstr (str) – the string to search for

• by (str) – the name of the attribute to search by, defaults to ‘name’ The specified at-tribute must be either a string or a dict mapping language codes to strings. Such attributesoccur, e.g. in pandasdmx.model.NameableArtefact which is a base class forpandasdmx.model.DataFlowDefinition and many others.

5.8. pandasdmx 33

pandaSDMX Documentation, Release 0.6.1

• language (str) – language code specifying the language of the text to be searched,defaults to ‘en’

Returns

items where value.<by> contains the search_str. International strings stored as dictwith language codes as keys are searched. Capitalization is ignored.

Return type DictLike

class pandasdmx.utils.NamedTupleFactoryBases: object

Wrap namedtuple function from the collections stdlib module to return a singleton if a nametuple with the samefield names has already been created.

cache = {(‘dim’, ‘value’, ‘attrib’): <class ‘pandasdmx.utils.SeriesObservation’>, (‘key’, ‘value’, ‘attrib’): <class ‘pandasdmx.utils.GenericObservation’>}

pandasdmx.utils.concat_namedtuples(*tup, **kwargs)Concatenate 2 or more namedtuples. The new namedtuple type is provided by NamedTupleFactory returnnew namedtuple instance

pandasdmx.writer package

Submodules

pandasdmx.writer.data2pandas module

This module contains a writer class that writes a generic data message to pandas dataframes or series.

class pandasdmx.writer.data2pandas.Writer(msg, **kwargs)Bases: pandasdmx.writer.BaseWriter

iter_pd_series(iter_series, dim_at_obs, dtype, attributes, reverse_obs, fromfreq, parse_time)

write(source=None, asframe=True, dtype=<class ‘numpy.float64’>, attributes=’‘, reverse_obs=False,fromfreq=False, parse_time=True)

Transfform a pandasdmx.model.DataMessage instance to a pandas DataFrame or iterator overpandas Series.

Parameters

• source (pandasdmx.model.DataMessage) – a pandasdmx.model.DataSet or it-erator of pandasdmx.model.Series

• asframe (bool) – if True, merge the series of values and/or attributes into one or twomulti-indexed pandas.DataFrame(s), otherwise return an iterator of pandas.Series. (de-fault: True)

• dtype (str, NP.dtype, None) – datatype for values. Defaults to NP.float64 ifNone, do not return the values of a series. In this case, attributes must not be an emptystring so that some attribute is returned.

• attributes (str, None) – string determining which attributes, if any, should bereturned in separate series or a separate DataFrame. Allowed values: ‘’, ‘o’, ‘s’, ‘g’, ‘d’or any combination thereof such as ‘os’, ‘go’. Defaults to ‘osgd’. Where ‘o’, ‘s’, ‘g’, and‘d’ mean that attributes at observation, series, group and dataset level will be returned asmembers of per-observation namedtuples.

• reverse_obs (bool) – if True, return observations in reverse order. Default: False

34 Chapter 5. Table of contents

pandaSDMX Documentation, Release 0.6.1

• fromfreq (bool) – if True, extrapolate time periods from the first item and FREQdimension. Default: False

• parse_time (bool) – if True (default), try to generate datetime index, provided thatdim_at_obs is ‘TIME’ or ‘TIME_PERIOD’. Otherwise, parse_time is ignored. IfFalse, always generate index of strings. Set it to False to increase performance and avoidparsing errors for exotic date-time formats unsupported by pandas.

pandasdmx.writer.structure2pd module

This module contains a writer class that writes artefacts from a StructureMessage to pandas dataFrames. This is useful,e.g., to visualize codes from a codelist or concepts from a concept scheme. The writer is more general though: It canoutput any collection of nameable SDMX objects.

class pandasdmx.writer.structure2pd.Writer(msg, **kwargs)Bases: pandasdmx.writer.BaseWriter

write(source=None, rows=None, **kwargs)Transfform structural metadata, i.e. codelists, concept-schemes, lists of dataflow definitions or category-schemes from a pandasdmx.model.StructureMessage instance into a pandas DataFrame. Thismethod is called by pandasdmx.api.Response.write() . It is not part of the public-facing API.Yet, certain kwargs are propagated from there.

Parameters

• source (pandasdmx.model.StructureMessage) – a pandasdmx.model.StructureMessage instance.

• rows (str) – sets the desired content to be extracted from the StructureMessage. Mustbe a name of an attribute of the StructureMessage. The attribute must be an instance of dictwhose keys are strings. These will be interpreted as ID’s and used for the MultiIndex ofthe DataFrame to be returned. Values can be either instances of dict such as for codelistsand categoryscheme, or simple nameable objects such as for dataflows. In the latter case,the DataFrame will have a flat index. (default: depends on content found in Message.Common is ‘codelist’)

• columns (str, list) – if str, it denotes the attribute of attributes of the values (name-able SDMX objects such as Code or ConceptScheme) that will be stored in the DataFrame.If a list, it must contain strings that are valid attibute values. Defaults to: [’name’, ‘de-scription’]

• lang (str) – locale identifier. Specifies the preferred language for international stringssuch as names. Default is ‘en’.

Module contents

This module contains the base class for writers.

class pandasdmx.writer.BaseWriter(msg, **kwargs)Bases: object

Submodules

5.8. pandasdmx 35

pandaSDMX Documentation, Release 0.6.1

pandasdmx.api module

This module defines two classes: pandasdmx.api.Request and pandasdmx.api.Response. Together,these form the high-level API of pandasdmx. Requesting data and metadata from an SDMX server requires a goodunderstanding of this API and a basic understanding of the SDMX web service guidelines only the chapters on RESTservices are relevant as pandasdmx does not support the SOAP interface.

class pandasdmx.api.Request(agency=’‘, cache=None, log_level=None, **http_cfg)Bases: object

Get SDMX data and metadata from remote servers or local files.

agency

clear_cache()

get(resource_type=’‘, resource_id=’‘, agency=’‘, key=’‘, params=None, headers={}, fromfile=None,tofile=None, url=None, get_footer_url=(30, 3), memcache=None, writer=None)get SDMX data or metadata and return it as a pandasdmx.api.Response instance.

While ‘get’ can load any SDMX file (also as zip-file) specified by ‘fromfile’, it can only construct URLs forthe SDMX service set for this instance. Hence, you have to instantiate a pandasdmx.api.Requestinstance for each data provider you want to access, or pass a pre-fabricated URL through the url param-eter.

Parameters

• resource_type (str) – the type of resource to be requested. Values must be oneof the items in Request._resources such as ‘data’, ‘dataflow’, ‘categoryscheme’ etc. It isused for URL construction, not to read the received SDMX file. Hence, if fromfile is given,resource_type may be ‘’. Defaults to ‘’.

• resource_id (str) – the id of the resource to be requested. It is used for URL con-struction. Defaults to ‘’.

• agency (str) – ID of the agency providing the data or metadata. Used for URL con-struction only. It tells the SDMX web service which agency the requested informationoriginates from. Note that an SDMX service may provide information from multiple dataproviders. may be ‘’ if fromfile is given. Not to be confused with the agency ID passed to__init__() which specifies the SDMX web service to be accessed.

• key (str, dict) – select columns from a dataset by specifying dimension values. Iftype is str, it must conform to the SDMX REST API, i.e. dot-separated dimension values.If ‘key’ is of type ‘dict’, it must map dimension names to allowed dimension values. Twoor more values can be separated by ‘+’ as in the str form. The DSD will be downloadedand the items are validated against it before downloading the dataset.

• params (dict) – defines the query part of the URL. The SDMX web service guide-lines (www.sdmx.org) explain the meaning of permissible parameters. It can be used torestrict the time range of the data to be delivered (startperiod, endperiod), whether parents,siblings or descendants of the specified resource should be returned as well (e.g. ref-erences=’parentsandsiblings’). Sensible defaults are set automatically depending on thevalues of other args such as resource_type. Defaults to {}.

• headers (dict) – http headers. Given headers will overwrite instance-wide headerspassed to the constructor. Defaults to None, i.e. use defaults from agency configuration

• fromfile (str) – path to the file to be loaded instead of accessing an SDMX webservice. Defaults to None. If fromfile is given, args relating to URL construction will beignored.

36 Chapter 5. Table of contents

pandaSDMX Documentation, Release 0.6.1

• tofile (str) – file path to write the received SDMX file on the fly. This is useful if youwant to load data offline using fromfile or if you want to open an SDMX file in an XMLeditor.

• url (str) – URL of the resource to download. If given, any other arguments such asresource_type or resource_id are ignored. Default is None.

• get_footer_url ((int, int)) – tuple of the form (seconds, number_of_attempts).Determines the behavior in case the received SDMX message has a footer where one ofits lines is a valid URL. get_footer_url defines how many attempts should be madeto request the resource at that URL after waiting so many seconds before each attempt.This behavior is useful when requesting large datasets from Eurostat. Other agencies donot seem to send such footers. Once an attempt to get the resource has been successful,the original message containing the footer is dismissed and the dataset is returned. Thetofile argument is propagated. Note that the written file may be a zip archive. pandaS-DMX handles zip archives since version 0.2.1. Defaults to (30, 3).

• memcache (str) – If given, return Response instance if already in self.cache(dict),

• download resource and cache Response instance. (otherwise) –

writer(str): optional custom writer class. Should inherit from pandasdmx.writer.BaseWriter. Defaultsto None, i.e. one of the included writers is selected as appropriate.

Returns

instance containing the requested SDMX Message.

Return type pandasdmx.api.Response

classmethod list_agencies()Return a sorted list of valid agency IDs. These can be used to create Request instances.

classmethod load_agency_profile(source)Classmethod loading metadata on a data provider. source must be a json-formated string or file-likeobject describing one or more data providers (URL of the SDMX web API, resource types etc. The dictRequest._agencies is updated with the metadata from the source.

Returns None

timeout

class pandasdmx.api.ResourceGetter(resource_type)Bases: object

Descriptor to wrap Request.get vor convenient calls without specifying the resource as arg.

class pandasdmx.api.Response(msg, url, headers, status_code, writer=None)Bases: object

Container class for SDMX messages.

It is instantiated by .

msgpandasdmx.model.Message – a pythonic representation of the SDMX message

status_codeint – the status code from the http response, if any

urlstr – the URL, if any, that was sent to the SDMX server

5.8. pandasdmx 37

pandaSDMX Documentation, Release 0.6.1

headersdict – http response headers returned by ‘’requests’‘

write()wrapper around the writer’s write method. Arguments are propagated to the writer.

write(source=None, **kwargs)Wrappe r to call the writer’s write method if present.

Parameters source (pandasdmx.model.Message, iterable) – stuff to be written.If a pandasdmx.model.Message is given, the writer itself must determine what to writeunless specified in the keyword arguments. If an iterable is given, the writer should write eachitem. Keyword arguments may specify what to do with the output depending on the writer’sAPI. Defaults to self.msg.

Returns anything the writer returns.

Return type type

write_source(filename)write xml file by calling the ‘write’ method of lxml root element. Useful to save the xml source file foroffline use. Similar to passing tofile arg to Request.get()

Parameters filename (str) – name/path of target file

Returns whatever the LXML deserializer returns.

pandasdmx.model module

This module is part of the pandaSDMX package

SDMX 2.1 information model

3. 2014 Dr. Leo ([email protected])

class pandasdmx.model.AnnotableArtefact(reader, elem, **kwargs)Bases: pandasdmx.model.SDMXObject

annotations

class pandasdmx.model.Annotation(reader, elem, **kwargs)Bases: pandasdmx.model.SDMXObject

annotationtype

id

text

title

url

class pandasdmx.model.AttributeDescriptor(*args, **kwargs)Bases: pandasdmx.model.ComponentList

class pandasdmx.model.Categorisation(*args, **kwargs)Bases: pandasdmx.model.MaintainableArtefact

class pandasdmx.model.Categorisations(*args, **kwargs)Bases: pandasdmx.model.SDMXObject, pandasdmx.utils.DictLike

class pandasdmx.model.Category(*args, **kwargs)Bases: pandasdmx.model.Item

38 Chapter 5. Table of contents

pandaSDMX Documentation, Release 0.6.1

class pandasdmx.model.CategoryScheme(*args, **kwargs)Bases: pandasdmx.model.ItemScheme

class pandasdmx.model.Code(*args, **kwargs)Bases: pandasdmx.model.Item

class pandasdmx.model.Codelist(*args, **kwargs)Bases: pandasdmx.model.ItemScheme

class pandasdmx.model.Component(*args, **kwargs)Bases: pandasdmx.model.IdentifiableArtefact

concept

concept_identity

local_repr

class pandasdmx.model.ComponentList(*args, **kwargs)Bases: pandasdmx.model.IdentifiableArtefact, pandasdmx.model.Scheme

class pandasdmx.model.Concept(*args, **kwargs)Bases: pandasdmx.model.Item

class pandasdmx.model.ConceptScheme(*args, **kwargs)Bases: pandasdmx.model.ItemScheme

class pandasdmx.model.ConstrainableBases: object

class pandasdmx.model.Constraint(*args, **kwargs)Bases: pandasdmx.model.MaintainableArtefact

class pandasdmx.model.ContentConstraint(*args, **kwargs)Bases: pandasdmx.model.Constraint

class pandasdmx.model.CubeRegion(*args, **kwargs)Bases: pandasdmx.model.SDMXObject

class pandasdmx.model.DataAttribute(*args, **kwargs)Bases: pandasdmx.model.Component

related_to

usage_status

class pandasdmx.model.DataMessage(*args, **kwargs)Bases: pandasdmx.model.Message

class pandasdmx.model.DataSet(*args, **kwargs)Bases: pandasdmx.model.SDMXObject

dim_at_obs

iter_groups

obs(with_values=True, with_attributes=True)return an iterator over observations in a flat dataset. An observation is represented as a namedtuple with 3fields (‘key’, ‘value’, ‘attrib’).

obs.key is a namedtuple of dimensions. Its field names represent dimension names, its values the dimensionvalues.

obs.value is a string that can in in most cases be interpreted as float64 obs.attrib is a namedtuple of attributenames and values.

5.8. pandasdmx 39

pandaSDMX Documentation, Release 0.6.1

with_values and with_attributes: If one or both of these flags is False, the respective value will be None.Use these flags to increase performance. The flags default to True.

seriesreturn an iterator over Series instances in this DataSet. Note that DataSets in flat format, i.e.header.dim_at_obs = “AllDimensions”, have no series. Use DataSet.obs() instead.

class pandasdmx.model.DataStructureDefinition(*args, **kwargs)Bases: pandasdmx.model.Structure

class pandasdmx.model.DataflowDefinition(*args, **kwargs)Bases: pandasdmx.model.StructureUsage, pandasdmx.model.Constrainable

class pandasdmx.model.Dimension(*args, **kwargs)Bases: pandasdmx.model.Component

class pandasdmx.model.DimensionDescriptor(*args, **kwargs)Bases: pandasdmx.model.ComponentList

class pandasdmx.model.Facet(facet_type=None, facet_value_type=’‘, itemscheme_facet=’‘, *args,**kwargs)

Bases: object

facet_type = {}

facet_value_type = (‘String’, ‘Big Integer’, ‘Integer’, ‘Long’, ‘Short’, ‘Double’, ‘Boolean’, ‘URI’, ‘DateTime’, ‘Time’, ‘GregorianYear’, ‘GregorianMonth’, ‘GregorianDate’, ‘Day’, ‘MonthDay’, ‘Duration’)

itemscheme_facet = ‘’

class pandasdmx.model.Footer(reader, elem, **kwargs)Bases: pandasdmx.model.SDMXObject

code

severity

text

class pandasdmx.model.Group(*args, **kwargs)Bases: pandasdmx.model.SDMXObject

class pandasdmx.model.Header(*args, **kwargs)Bases: pandasdmx.model.SDMXObject

error

id

prepared

receiver

sender

class pandasdmx.model.IdentifiableArtefact(*args, **kwargs)Bases: pandasdmx.model.AnnotableArtefact

uri

class pandasdmx.model.Item(*args, **kwargs)Bases: pandasdmx.model.NameableArtefact

children

class pandasdmx.model.ItemScheme(*args, **kwargs)Bases: pandasdmx.model.MaintainableArtefact, pandasdmx.model.Scheme

40 Chapter 5. Table of contents

pandaSDMX Documentation, Release 0.6.1

is_partial

class pandasdmx.model.KeyValue(*args, **kwargs)Bases: pandasdmx.model.SDMXObject

class pandasdmx.model.MaintainableArtefact(*args, **kwargs)Bases: pandasdmx.model.VersionableArtefact

is_external_ref

is_final

maintainer

service_url

structure_url

class pandasdmx.model.MeasureDescriptor(*args, **kwargs)Bases: pandasdmx.model.ComponentList

class pandasdmx.model.MeasureDimension(*args, **kwargs)Bases: pandasdmx.model.Dimension

class pandasdmx.model.Message(*args, **kwargs)Bases: pandasdmx.model.SDMXObject

class pandasdmx.model.NameableArtefact(*args, **kwargs)Bases: pandasdmx.model.IdentifiableArtefact

description

name

class pandasdmx.model.PrimaryMeasure(*args, **kwargs)Bases: pandasdmx.model.Component

class pandasdmx.model.Ref(reader, elem, **kwargs)Bases: pandasdmx.model.SDMXObject

agency_id

id

maintainable_parent_id

package

ref_class

resolve()

version

class pandasdmx.model.ReportingYearStartDay(*args, **kwargs)Bases: pandasdmx.model.DataAttribute

class pandasdmx.model.Representation(*args, **kwargs)Bases: pandasdmx.model.SDMXObject

class pandasdmx.model.SDMXObject(reader, elem, **kwargs)Bases: object

class pandasdmx.model.Scheme(*args, **kwargs)Bases: pandasdmx.utils.DictLike

aslist()

5.8. pandasdmx 41

pandaSDMX Documentation, Release 0.6.1

class pandasdmx.model.Series(*args, **kwargs)Bases: pandasdmx.model.SDMXObject

group_attribreturn a namedtuple containing all attributes attached to groups of which the given series is a member foreach group of which the series is a member

obs(with_values=True, with_attributes=True, reverse_obs=False)return an iterator over observations in a series. An observation is represented as a namedtuple with 3 fields(‘key’, ‘value’, ‘attrib’). obs.key is a namedtuple of dimensions, obs.value is a string value and obs.attribis a namedtuple of attributes. If with_values or with_attributes is False, the respective value is None. Usethese flags to increase performance. The flags default to True.

class pandasdmx.model.Structure(*args, **kwargs)Bases: pandasdmx.model.MaintainableArtefact

class pandasdmx.model.StructureMessage(*args, **kwargs)Bases: pandasdmx.model.Message

class pandasdmx.model.StructureUsage(*args, **kwargs)Bases: pandasdmx.model.MaintainableArtefact

structure

class pandasdmx.model.TimeDimension(*args, **kwargs)Bases: pandasdmx.model.Dimension

class pandasdmx.model.VersionableArtefact(*args, **kwargs)Bases: pandasdmx.model.NameableArtefact

valid_from

valid_to

version

pandasdmx.remote module

This module is part of pandaSDMX. It contains a classes for http access.

class pandasdmx.remote.REST(cache, http_cfg)Bases: object

Query SDMX resources via REST or from a file

The constructor accepts arbitrary keyword arguments that will be passed to the requests.get function on eachcall. This makes the REST class somewhat similar to a requests.Session. E.g., proxies or authorisation dataneeds only be provided once. The keyword arguments are stored in self.config. Modify this dict to issue thenext ‘get’ request with changed arguments.

get(url, fromfile=None, params={}, headers={})Get SDMX message from REST service or local file

Parameters

• url (str) – URL of the REST service without the query part If None, fromfile must beset. Default is None

• params (dict) – will be appended as query part to the URL after a ‘?’

• fromfile (str) – path to SDMX file containing an SDMX message. It will be passedon to the reader for parsing.

42 Chapter 5. Table of contents

pandaSDMX Documentation, Release 0.6.1

• headers (dict) – http headers. Overwrite instance-wide headers. Default is {}

Returns

three objects:

0. file-like object containing the SDMX message

1. the complete URL, if any, including the query part constructed from params

2. the status code

Return type tuple

Raises HTTPError if SDMX service responded with – status code 401. Otherwise, the statuscode is returned

max_size = 16777216

request(url, params={}, headers={})Retrieve SDMX messages. If needed, override in subclasses to support other data providers.

Parameters url (str) – The URL of the message.

Returns the xml data as file-like object

pandasdmx.remote.is_url(s)return True if s (str) is a valid URL, False otherwise.

Module contents

pandaSDMX - a Python package for SDMX - Statistical Data and Metadata eXchange

class pandasdmx.Request(agency=’‘, cache=None, log_level=None, **http_cfg)Bases: object

Get SDMX data and metadata from remote servers or local files.

agency

clear_cache()

get(resource_type=’‘, resource_id=’‘, agency=’‘, key=’‘, params=None, headers={}, fromfile=None,tofile=None, url=None, get_footer_url=(30, 3), memcache=None, writer=None)get SDMX data or metadata and return it as a pandasdmx.api.Response instance.

While ‘get’ can load any SDMX file (also as zip-file) specified by ‘fromfile’, it can only construct URLs forthe SDMX service set for this instance. Hence, you have to instantiate a pandasdmx.api.Requestinstance for each data provider you want to access, or pass a pre-fabricated URL through the url param-eter.

Parameters

• resource_type (str) – the type of resource to be requested. Values must be oneof the items in Request._resources such as ‘data’, ‘dataflow’, ‘categoryscheme’ etc. It isused for URL construction, not to read the received SDMX file. Hence, if fromfile is given,resource_type may be ‘’. Defaults to ‘’.

• resource_id (str) – the id of the resource to be requested. It is used for URL con-struction. Defaults to ‘’.

• agency (str) – ID of the agency providing the data or metadata. Used for URL con-struction only. It tells the SDMX web service which agency the requested informationoriginates from. Note that an SDMX service may provide information from multiple data

5.8. pandasdmx 43

pandaSDMX Documentation, Release 0.6.1

providers. may be ‘’ if fromfile is given. Not to be confused with the agency ID passed to__init__() which specifies the SDMX web service to be accessed.

• key (str, dict) – select columns from a dataset by specifying dimension values. Iftype is str, it must conform to the SDMX REST API, i.e. dot-separated dimension values.If ‘key’ is of type ‘dict’, it must map dimension names to allowed dimension values. Twoor more values can be separated by ‘+’ as in the str form. The DSD will be downloadedand the items are validated against it before downloading the dataset.

• params (dict) – defines the query part of the URL. The SDMX web service guide-lines (www.sdmx.org) explain the meaning of permissible parameters. It can be used torestrict the time range of the data to be delivered (startperiod, endperiod), whether parents,siblings or descendants of the specified resource should be returned as well (e.g. ref-erences=’parentsandsiblings’). Sensible defaults are set automatically depending on thevalues of other args such as resource_type. Defaults to {}.

• headers (dict) – http headers. Given headers will overwrite instance-wide headerspassed to the constructor. Defaults to None, i.e. use defaults from agency configuration

• fromfile (str) – path to the file to be loaded instead of accessing an SDMX webservice. Defaults to None. If fromfile is given, args relating to URL construction will beignored.

• tofile (str) – file path to write the received SDMX file on the fly. This is useful if youwant to load data offline using fromfile or if you want to open an SDMX file in an XMLeditor.

• url (str) – URL of the resource to download. If given, any other arguments such asresource_type or resource_id are ignored. Default is None.

• get_footer_url ((int, int)) – tuple of the form (seconds, number_of_attempts).Determines the behavior in case the received SDMX message has a footer where one ofits lines is a valid URL. get_footer_url defines how many attempts should be madeto request the resource at that URL after waiting so many seconds before each attempt.This behavior is useful when requesting large datasets from Eurostat. Other agencies donot seem to send such footers. Once an attempt to get the resource has been successful,the original message containing the footer is dismissed and the dataset is returned. Thetofile argument is propagated. Note that the written file may be a zip archive. pandaS-DMX handles zip archives since version 0.2.1. Defaults to (30, 3).

• memcache (str) – If given, return Response instance if already in self.cache(dict),

• download resource and cache Response instance. (otherwise) –

writer(str): optional custom writer class. Should inherit from pandasdmx.writer.BaseWriter. Defaultsto None, i.e. one of the included writers is selected as appropriate.

Returns

instance containing the requested SDMX Message.

Return type pandasdmx.api.Response

classmethod list_agencies()Return a sorted list of valid agency IDs. These can be used to create Request instances.

classmethod load_agency_profile(source)Classmethod loading metadata on a data provider. source must be a json-formated string or file-likeobject describing one or more data providers (URL of the SDMX web API, resource types etc. The dictRequest._agencies is updated with the metadata from the source.

44 Chapter 5. Table of contents

pandaSDMX Documentation, Release 0.6.1

Returns None

timeout

Contributing

Contributions such as bug reports or pull requests and any other user feedback are much appreciated. Developmenttakes place on github. There is also a low traffic Mailing list.

License

Notwithstanding other licenses applicable to any third-party software included in this package, pandaSDMX is li-censed under the Apache 2.0 license, a copy of which is included in the source distribution.

Copyright 2014, 2015 Dr. Leo <fhaxbox66qgmail.com>, All Rights Reserved.

5.9. Contributing 45

pandaSDMX Documentation, Release 0.6.1

46 Chapter 5. Table of contents

CHAPTER 6

Indices and tables

• genindex

• modindex

• search

47

pandaSDMX Documentation, Release 0.6.1

48 Chapter 6. Indices and tables

Python Module Index

ppandasdmx, 43pandasdmx.api, 36pandasdmx.model, 38pandasdmx.reader, 32pandasdmx.reader.sdmxjson, 30pandasdmx.reader.sdmxml, 31pandasdmx.remote, 42pandasdmx.utils, 33pandasdmx.utils.aadict, 32pandasdmx.utils.anynamedtuple, 33pandasdmx.writer, 35pandasdmx.writer.data2pandas, 34pandasdmx.writer.structure2pd, 35

49

pandaSDMX Documentation, Release 0.6.1

50 Python Module Index

Index

Aaadict (class in pandasdmx.utils.aadict), 32agency (pandasdmx.api.Request attribute), 36agency (pandasdmx.Request attribute), 43agency_id (pandasdmx.model.Ref attribute), 41AnnotableArtefact, 18AnnotableArtefact (class in pandasdmx.model), 38Annotation (class in pandasdmx.model), 38annotations (pandasdmx.model.AnnotableArtefact

attribute), 38annotationtype (pandasdmx.model.Annotation attribute),

38any() (pandasdmx.utils.DictLike method), 33aslist() (pandasdmx.model.Scheme method), 41aslist() (pandasdmx.utils.DictLike method), 33attachment-constraint, 18AttributeDescriptor (class in pandasdmx.model), 38attributes, 17

BBaseReader (class in pandasdmx.reader), 32BaseWriter (class in pandasdmx.writer), 35

Ccache (pandasdmx.utils.NamedTupleFactory attribute),

34Categorisation, 18Categorisation (class in pandasdmx.model), 38Categorisations (class in pandasdmx.model), 38Category, 18Category (class in pandasdmx.model), 38CategoryScheme (class in pandasdmx.model), 38CategorySchemes, 18children (pandasdmx.model.Item attribute), 40classes, 16clear_cache() (pandasdmx.api.Request method), 36clear_cache() (pandasdmx.Request method), 43Code (class in pandasdmx.model), 39code (pandasdmx.model.Footer attribute), 40

code lists, 17Codelist (class in pandasdmx.model), 39Component (class in pandasdmx.model), 39ComponentList (class in pandasdmx.model), 39concat_namedtuples() (in module pandasdmx.utils), 34concept, 17Concept (class in pandasdmx.model), 39concept (pandasdmx.model.Component attribute), 39concept scheme, 17concept_identity (pandasdmx.model.Component at-

tribute), 39ConceptScheme (class in pandasdmx.model), 39Constrainable (class in pandasdmx.model), 39Constraint (class in pandasdmx.model), 39content-constraint, 18ContentConstraint (class in pandasdmx.model), 39cross-sectional, 17CubeRegion (class in pandasdmx.model), 39

Dd2a() (pandasdmx.utils.aadict.aadict static method), 32d2ar() (pandasdmx.utils.aadict.aadict static method), 32DataAttribute (class in pandasdmx.model), 39dataflow, 17DataFlowDefinition, 17, 18DataflowDefinition (class in pandasdmx.model), 40DataMessage (class in pandasdmx.model), 39DataSet, 18dataset, 17DataSet (class in pandasdmx.model), 39dataset_attrib() (pandasdmx.reader.sdmxjson.Reader

method), 30dataset_attrib() (pandasdmx.reader.sdmxml.Reader

method), 31DataStructureDefinition, 17, 18DataStructureDefinition (class in pandasdmx.model), 40description (pandasdmx.model.NameableArtefact at-

tribute), 41DictLike (class in pandasdmx.utils), 33dim_at_obs (pandasdmx.model.DataSet attribute), 39

51

pandaSDMX Documentation, Release 0.6.1

dim_at_obs() (pandasdmx.reader.sdmxjson.Readermethod), 31

dim_at_obs() (pandasdmx.reader.sdmxml.Readermethod), 31

Dimension (class in pandasdmx.model), 40dimension at observation, 17DimensionDescriptor (class in pandasdmx.model), 40dimensions, 17

Eerror (pandasdmx.model.Header attribute), 40

FFacet (class in pandasdmx.model), 40facet_type (pandasdmx.model.Facet attribute), 40facet_value_type (pandasdmx.model.Facet attribute), 40facets, 17find() (pandasdmx.utils.DictLike method), 33flat datasets, 17Footer, 19Footer (class in pandasdmx.model), 40

Ggeneric_groups() (pandasdmx.reader.sdmxjson.Reader

method), 31generic_groups() (pandasdmx.reader.sdmxml.Reader

method), 31generic_series() (pandasdmx.reader.sdmxjson.Reader

method), 31generic_series() (pandasdmx.reader.sdmxml.Reader

method), 31GenericDataMessage, 18get() (pandasdmx.api.Request method), 36get() (pandasdmx.remote.REST method), 42get() (pandasdmx.Request method), 43getitem0 (pandasdmx.reader.sdmxjson.Reader attribute),

31getitem_key (pandasdmx.reader.sdmxjson.Reader at-

tribute), 31group, 17Group (class in pandasdmx.model), 40group_attrib (pandasdmx.model.Series attribute), 42group_key() (pandasdmx.reader.sdmxjson.Reader

method), 31group_key() (pandasdmx.reader.sdmxml.Reader

method), 31groups, 17

HHeader, 19Header (class in pandasdmx.model), 40header_error() (pandasdmx.reader.sdmxjson.Reader

method), 31

header_error() (pandasdmx.reader.sdmxml.Readermethod), 31

headers (pandasdmx.api.Response attribute), 37

Iid (pandasdmx.model.Annotation attribute), 38id (pandasdmx.model.Header attribute), 40id (pandasdmx.model.Ref attribute), 41IdentifiableArtefact, 18IdentifiableArtefact (class in pandasdmx.model), 40information model, 16initialize() (pandasdmx.reader.BaseReader method), 32initialize() (pandasdmx.reader.sdmxjson.Reader method),

31initialize() (pandasdmx.reader.sdmxml.Reader method),

31international_str() (pandasdmx.reader.sdmxjson.Reader

method), 31international_str() (pandasdmx.reader.sdmxml.Reader

method), 31is_external_ref (pandasdmx.model.MaintainableArtefact

attribute), 41is_final (pandasdmx.model.MaintainableArtefact at-

tribute), 41is_partial (pandasdmx.model.ItemScheme attribute), 40is_url() (in module pandasdmx.remote), 43Item (class in pandasdmx.model), 40ItemScheme (class in pandasdmx.model), 40itemscheme_facet (pandasdmx.model.Facet attribute), 40iter_generic_obs() (pandasdmx.reader.sdmxjson.Reader

method), 31iter_generic_obs() (pandasdmx.reader.sdmxml.Reader

method), 31iter_generic_series_obs() (pandas-

dmx.reader.sdmxjson.Reader method), 31iter_generic_series_obs() (pandas-

dmx.reader.sdmxml.Reader method), 31iter_groups (pandasdmx.model.DataSet attribute), 39iter_pd_series() (pandasdmx.writer.data2pandas.Writer

method), 34

KKeyValue (class in pandasdmx.model), 41

Llist_agencies() (pandasdmx.api.Request class method), 37list_agencies() (pandasdmx.Request class method), 44load_agency_profile() (pandasdmx.api.Request class

method), 37load_agency_profile() (pandasdmx.Request class

method), 44local_repr (pandasdmx.model.Component attribute), 39

52 Index

pandaSDMX Documentation, Release 0.6.1

Mmaintainable_parent_id (pandasdmx.model.Ref at-

tribute), 41MaintainableArtefact, 18MaintainableArtefact (class in pandasdmx.model), 41maintainer (pandasdmx.model.MaintainableArtefact at-

tribute), 41max_size (pandasdmx.remote.REST attribute), 43MeasureDescriptor (class in pandasdmx.model), 41MeasureDimension, 17MeasureDimension (class in pandasdmx.model), 41Message, 18Message (class in pandasdmx.model), 41msg (pandasdmx.api.Response attribute), 37

Nname (pandasdmx.model.NameableArtefact attribute), 41NameableArtefact (class in pandasdmx.model), 41namedtuple() (in module pandas-

dmx.utils.anynamedtuple), 33NamedTupleFactory (class in pandasdmx.utils), 34

Oobs() (pandasdmx.model.DataSet method), 39obs() (pandasdmx.model.Series method), 42observations, 17omit() (pandasdmx.utils.aadict.aadict method), 32

Ppackage (pandasdmx.model.Ref attribute), 41pandasdmx (module), 43pandasdmx.api (module), 36pandasdmx.model (module), 38pandasdmx.reader (module), 32pandasdmx.reader.sdmxjson (module), 30pandasdmx.reader.sdmxml (module), 31pandasdmx.remote (module), 42pandasdmx.utils (module), 33pandasdmx.utils.aadict (module), 32pandasdmx.utils.anynamedtuple (module), 33pandasdmx.writer (module), 35pandasdmx.writer.data2pandas (module), 34pandasdmx.writer.structure2pd (module), 35pick() (pandasdmx.utils.aadict.aadict method), 32prepared (pandasdmx.model.Header attribute), 40PrimaryMeasure (class in pandasdmx.model), 41

Rread_as_str() (pandasdmx.reader.BaseReader method),

32read_as_str() (pandasdmx.reader.sdmxjson.Reader

method), 31

read_identifiables() (pandasdmx.reader.BaseReadermethod), 32

read_instance() (pandasdmx.reader.BaseReader method),32

Reader (class in pandasdmx.reader.sdmxjson), 30Reader (class in pandasdmx.reader.sdmxml), 31receiver (pandasdmx.model.Header attribute), 40Ref (class in pandasdmx.model), 41ref_class (pandasdmx.model.Ref attribute), 41references, 19related_to (pandasdmx.model.DataAttribute attribute), 39ReportingYearStartDay (class in pandasdmx.model), 41Representation (class in pandasdmx.model), 41Request (class in pandasdmx), 43Request (class in pandasdmx.api), 36request() (pandasdmx.remote.REST method), 43resolve() (pandasdmx.model.Ref method), 41ResourceGetter (class in pandasdmx.api), 37Response (class in pandasdmx.api), 37REST (class in pandasdmx.remote), 42

SScheme (class in pandasdmx.model), 41SDMXML, 18SDMXObject (class in pandasdmx.model), 41sender (pandasdmx.model.Header attribute), 40series, 17Series (class in pandasdmx.model), 41series (pandasdmx.model.DataSet attribute), 40series_attrib() (pandasdmx.reader.sdmxjson.Reader

method), 31series_attrib() (pandasdmx.reader.sdmxml.Reader

method), 32series_key() (pandasdmx.reader.sdmxjson.Reader

method), 31series_key() (pandasdmx.reader.sdmxml.Reader method),

32service_url (pandasdmx.model.MaintainableArtefact at-

tribute), 41severity (pandasdmx.model.Footer attribute), 40status_code (pandasdmx.api.Response attribute), 37Structure (class in pandasdmx.model), 42structure (pandasdmx.model.StructureUsage attribute),

42structure_url (pandasdmx.model.MaintainableArtefact at-

tribute), 41structured_by() (pandasdmx.reader.sdmxjson.Reader

method), 31structured_by() (pandasdmx.reader.sdmxml.Reader

method), 32StructureMessage, 19StructureMessage (class in pandasdmx.model), 42StructureSpecificDataSet, 18StructureUsage (class in pandasdmx.model), 42

Index 53

pandaSDMX Documentation, Release 0.6.1

Ttext (pandasdmx.model.Annotation attribute), 38text (pandasdmx.model.Footer attribute), 40TimeDimension (class in pandasdmx.model), 42timeout (pandasdmx.api.Request attribute), 37timeout (pandasdmx.Request attribute), 45title (pandasdmx.model.Annotation attribute), 38

Uupdate() (pandasdmx.utils.aadict.aadict method), 32uri (pandasdmx.model.IdentifiableArtefact attribute), 40url (pandasdmx.api.Response attribute), 37url (pandasdmx.model.Annotation attribute), 38usage_status (pandasdmx.model.DataAttribute attribute),

39

Vvalid_from (pandasdmx.model.VersionableArtefact at-

tribute), 42valid_to (pandasdmx.model.VersionableArtefact at-

tribute), 42version (pandasdmx.model.Ref attribute), 41version (pandasdmx.model.VersionableArtefact at-

tribute), 42VersionableArtefact, 18VersionableArtefact (class in pandasdmx.model), 42

Wwrite() (pandasdmx.api.Response method), 38write() (pandasdmx.writer.data2pandas.Writer method),

34write() (pandasdmx.writer.structure2pd.Writer method),

35write_source() (pandasdmx.api.Response method), 38write_source() (pandasdmx.reader.sdmxjson.Reader

method), 31write_source() (pandasdmx.reader.sdmxml.Reader

method), 32Writer (class in pandasdmx.writer.data2pandas), 34Writer (class in pandasdmx.writer.structure2pd), 35

XXPath (class in pandasdmx.reader.sdmxjson), 31

54 Index