Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
1
Open Data Licensing: More than meets the eye
Mashael Khayyat
Trinity College Dublin & King Abdulaziz University, Jeddah
Frank Bannister
Trinity College Dublin
Abstract
In discussions of open government data (hereafter simply open data or OGD) the question of how
such data should be licensed or whether they need to be licensed at all has to date received only
limited attention – at least in the academic literature. A common assumption, at least in the public
sphere, is that a large fraction of the data collected by governments can and should be released free
of any constraints or restrictions for all to access and do with as they will. However, even for data
that do not fall within the ambit of the security of the state it is far from obvious that this must be
so; different forms of formal licensing may be appropriate and necessary in many cases. A
libertarian approach to OGD is just one of a number of licensing options.
A common assumption, at least in the public sphere, is that a large proportion of the data collected
and held by governments can and should be released free of any constraints or restrictions for all
citizens, communities and organizations to access and use as they wish. However, even for data that
does not fall within the ambit of personal privacy, the security of the state or is otherwise sensitive,
it is far from obvious that this should be so; different forms of formal licensing may be appropriate in
some cases and necessary in others. A libertarian, free-for-all approach to open government data is
just one of a number of licensing options from which governments can choose.
This paper will explore the various dimensions of open data licensing. Starting from a definition of
what a licence is, it will first look at the debate(s) that have surrounded licensing in the worlds of the
open systems, freeware, shareware and open source. It will then examine and critique a number of
existing or proposed open data licences including various international and national licencing
frameworks. The Creative Commons and Open Database (ODbL) Licenses will be critically examined
and possible problems with the concepts underlying various licences will be explored. The question
of what may be suitable for standard public licenses and what may require bespoke or customised
licensing will be analysed. Other questions to be investigated will be the policing and conformance
as well as the implications of modern analytics and the mashing up of large data sets from different
sources.
2
1. Introduction
The modest, but growing, body of research into the barriers to the release of data collected and held
by governments consistently includes a discussion of legal issues. There are several obstacles which
fall under this general heading including existing legislative requirements such as data protection
acts, intellectual property rights, risks of consequential harm, commercial sensitivities, concerns
about modern data analysis technology and individual privacy rights (Barry & Bannister 2014;
Janssen 2012; Bertot et al 2010). Some scholars and theorists argue that, in such a legally complex
situation, a well-designed licensing regime is not only necessary, it is critical to the success of open
data initiatives. Creating a legal environment in which citizens, communities and corporates can use
such data with clarity and confidence about their rights and obligations is essential if societies are to
make the most of these data. Its absence is likely to hinder creativity and the economic, social and
political benefits that are widely expected to ensue (Korn and Oppenheim 2011). According to Korn
and Oppenheim an understanding of open data licensing is important for establishing “which and
how” data can be re-used. It is not only important to understand the legal issues that may arise in
the context of licensing open data, but also the different types of licences that are available and the
implications that they carry with them.
This paper examines open government data licensing and explores a number of its dimensions. Its
objectives are to highlight the complexities surrounding this topic, to examine some of these and to
critique the current approach to open government data (OGD) licensing. This paper is organised as
follows. Section two looks at the background to open licencing starting with a brief review of the
Open Source movement and the approach to open licensing of software. Section three examines
current OGD/open data (OD) licences and different approaches to licencing. Section four is a
critique of the concept of OGD and includes some reflections on how the question of OGD licensing
might evolve. Section five is a brief conclusion and contains some recommendations for future
research.
2.0 Background
2.1 The Open Software Movement and Copyleft
In the world of information and communications technology (ICT) the term ‘licence’ is traditionally
associated with software though licensing is also used for other type of intellectual property such as
methodologies. Within the Information Systems (IS) literature and community there has been and
continues to be much discussion about the relative merits and demerits of open software and open
source. Over several decades, the Open Source movement has proposed or developed a number of
business models which are designed to offer users various forms of freedom to modify software and
pass it on, partially encumbered or unencumbered, to others. The foundational principle of the
movement is that software should be free in the sense of free of restrictions on use and modification
rather than free of charge (which is separate issue). Unsurprisingly, some complicated legal
problems can arise once one starts exploring the question of software licences in any dept
3
Non-proprietary software comes in a number of flavours. One key distinction is whether or not the
source code is available. Freeware is the term generally applied to applications which anybody can
use for free and without a licence, but which the user cannot modify or sell on a third party. Many
PC games and utilities, for example, fall into this category. Some smartphone Apps broadly fall into
this category. Another variation is shareware where the user needs a licence or permit and there
may or may not be a charge for use. More complicated problems arise when source code is made
available. This means that the user can modify the code, but while the original source code may be
free, a user may feel entitled to charge for his modifications. Thus a developer may take some open
source code, modify it and charge for the enhanced product. The latter may not matter provided he
supplies the source code of his modification to other users to do what they want without further
conditions or cost even though this will limit his ability to make money from his enhancement.
Developers have sometimes tried to circumvent this problem by embedding proprietary code or by
attaching proprietary add-ons to open source code. Others have tried to ‘claim jump’ and hijack the
free source code. This problem has led to what is called the Open Source Definition which sets out a
number of criteria for open source software namely:
There must be free redistribution. No royalties or fees;
Distribution must include the source code;
Derived works are allowed. Modifications must be permitted;
The integrity of the author's source code must be maintained;
There can be no discrimination against persons or groups;
There can be no discrimination against fields of endeavour;
Distribution of licence. One licence covers all;
A licence must not be specific (tied) to a product ;
A licence must not restrict other software;
A license must be technology-neutral.
(Open Source Initiative 2014). Not all of these apply, or can be adapted to apply, to data, but some
can. An attempt to create a similar set of principles for open data is discussed in section four.
An obvious question about open source is this: in such a world how does a software developer make
a living? Various attempts have been made to address this problem and as a result there are
currently over 100 free and/or open source licences available - including one from the EU namely
the European Union Public Licence (Joinup 2014; SchmThe most important and influential of these
licences is probably the General Public Licence (GPL) which incorporates the concept of copyleft.
Under a GPL licence, a developer who modifies open source code cannot impose any conditions on a
user’s use further modification of the modified product although he can charge for the modifications
that he has made (Free Software Foundation 2014).
A full discussion of this is beyond the scope of this paper. This summary is presented because
several of the problems and issues in open data have parallels in open source software though there
are other issues that arise with data, but which are not a problem with software and vice versa.
Nonetheless, given that the open source movement has been around for several decades there are
likely to be useful lessons which can be drawn from the accumulated knowledge in this field. As will
4
be seen, the principle of copyleft has been adapted and applied to data in the Creative Commons
Licence.
2.2 The Legislative Context
There is a number of critical laws surrounding the data that governments use and the way that
government are allowed to use such data. Even without considering other factors (of which there
are many – see below) existing legislation has multiple implications when it comes to licensing both
software and data. Central to any discussion of data licensing are two types of act: data protection
acts and freedom of information (FoI) acts, though other legislation and quasi legislation (such as
privacy rights and official secrets acts) also bear on licence design. Of these, the most important are
data protection acts (DPAs)
Most developed countries now have a DPA. The first DPA was enacted in the in the German state of
Hesse in 1970 (Privacy International 2014). Since that time almost every country in the developed
world has enacted some form of this legislation (DLA Piper 2013). The primary purpose of such acts
has been to protect the personal data of individual citizens (Korn and Oppenheim 2011). But
governments hold far more than citizens’ personal details or matters related to state security. They
hold large volumes of data which are commercially valuable and in particular which have embedded
Intellectual Property Rights (IPR). Korn and Oppenheim note that IPR can encompass several
subsidiary rights including copyright, database rights, moral rights, and other rights. For the
purposes of this paper the most important of these are copyright and database rights. The latter will
be considered first.
A database structure might enjoy copyright which protects the author’s or designer’s rights in their
creation. As a consequence, individual data items such as records and metadata may require
copyright protection to protect a creator’s IPR. Such rights are also necessary to protect creators or
designers from what Korn and Oppenheim describe as “derogatory treatment”, i.e. amending data
or quoting it out of context in a way that could mislead others and potentially damage the
reputation of the data creator or provider. Korn and Oppenheim state that there are three levels of
rights regarding database rights:
Database rights,
Dataset rights and
Data rights
These rights can arise where an individual or corporate has put substantial investment, be that in the
form of financial and/or human and/or technical resources, into the construction of a database or
the assembly of data in such a database. A complex database design might be considered to be a
form of intellectual property and legal protection of this may be justifiable. Korn and Oppenheim
stress that when it comes to databases, datasets and data the question of IPR is usually complex and
recommend taking legal advice about a database license before using others’ data is essential to
avoid legal risks. Miller, Styles and Heath (2008) claim that without “a legally recognised database
right”, communities will lack the authority to publish data or products, services or even findings
5
derived from such data. Their solution to this problem is to use a ‘Share-Alike’ agreement for any
open data. This concept is discussed further below.
The question of IPR in databases and data is a particularly difficult one for government. There are
several reasons for this. As Nicol, Caruso and Archambault (2013, p.6) note:
“Governments produce and own large datasets [and] most national open data policies primarily
target these datasets”.
Government datasets include personal data, corporate data, military intelligence, patent
applications, criminal records, employee performance records and so on. It is not just the volume of
data that governments hold that matters; it is the sheer variety. Governments also have much
duplicated data typically scattered across multiple agencies and departments. Such data may be
inconsistent and lack integrity. Government agencies range from those concerned with the security
of the state to those whose job it is to foster enterprise or deal with serious social problems. And so
on. It all makes for a demanding legal challenge when it comes to determining the rules, terms and
conditions for data release.
2.3 Licensing and Open Data Policy
The need for comprehensive open data access policies was formally recognised in 2004 by the
Ministers of Science and Technology of the thirty members of the Organisation for Economic Co-
operation and Development (OECD) countries as well as those of China, Israel, Russia, and South
Africa. Discussions of open data policy generally contain references to licencing as one of the steps
involved in policy formulation. Schutzberg (2014) states that one of the key objectives of open data
policy statements is to set out under what license conditions data are to be made available. Open
data stakeholders (local, state, regional, federal and private entities) may choose to develop and
implement open data policies. He lists some components that are commonly found in open data
policy statements:
1. Which data are to be made open and which are not;
2. When past and future data are to be available;
3. Where the data can be obtained/accessed;
4. In what format(s) the data are to be available;
5. Under what license(s) the data are to be available; and
6. The cost (if any) for reproduction or use the data or any software associated with it.
Kaufman and Wagner (2012) also consider licensing as an integral aspect of open data processing.
They consider a licence as one of seven steps necessary to open and maintain data, namely:
1. Find […] data;
2. Convert data;
3. Test […] output;
4. Write up a license agreement;
5. Publish and publicize;
6
6. Update and modify as needed; and
7. create and maintain a dialogue”
(Kaufman and Wagner 2012 as cited in Wimmer et al., 2013, p.77).
Another example of the presence of licencing in an open data framework can be found in the Ten
Open Data Building blocks proposed by Davies (2012) (see table 1). Davies, despite comments he
has made elsewhere (see below), would appear to acknolwedge that it is important to have an
‘explicit licence’ although in line with his general fairly libertarian approach he emphasises the
importance of using licences that have the fewest constraints though with acknowledgment of the
source of the data.
Open data building block
Brief explanation
1.Leadership and bureaucratic support
“a top level mandate” from Senior politicians, and an “engaged and well resourced ‘middle layer’ of skilled government bureaucrats” are essential to secure the release of open data (Hogge 2010).
2. Datasets Datasets are at the core of open data Open datasets need to be accessible(usually online), technically Open(in a non-proprietary format),and legally open (Eaves 2009).
3. Licences A range of copyright and intellectual property laws can cover Datasets. It is important to have an explicit license because without it, re- users will not know their legal permissions and rights of dealing with data such as sharing data, combining data with other data, building a commercial service off the back of a dataset. At the same time, open data advocates stress on the importance of facilitates licenses that have least constraints with acknowledgment of the source of the data.
4. Data standards Describes what a dataset can contain such as the fields of it, how they commonly represented, and what conventions should be used for sharing dates, locations, categories and other common elements.
5. Data portals A data portal provides access to open datasets, hosting meta-data that Describes them, and allowing visitors to search for Relevant datasets.
6. Interpretations, interfaces and applications
Third parties can provide their own interpretations or analysis in static reports and publications; they can build interfaces and visualisations of data to show trends and patterns in it; and they can create Interactive applications that provide Useful functionality.
7. Outreach and engagement
Just putting data online is not enough to get it used. Outreach, community building and engagement is required. The five stars of open data engagement explains that an open data initiative should: be demand driven; put data in context; support conversations around data; build capacity, skills and networks; and lead to collaboration on data as a common resource
8. Capacity building Capacity building often needs to Take place on both supply and use sides of an open Data initiative
9. Feedback loops Establish channels through which they can accept and work with feedback, either enhancing the data they hold, or taking action on the basis of feedback.
10. Policy and legislative lock-in
Develop a statutory footing by creating ‘right to data’ legislation, or writing open data clearly into contracts and policies.
Table.1: Ten Building Blocks of an Open Data Initiative (Davies 2012)
All three of these discussions include licensing as a key component in the development of open data
policies though views on the nature of those licences may not coincide. This leads to the question of
7
whether there should be (or even if is it possible to devise) a single open data licence in the same
way that the Open Source movement has tried to do for software or will a number of different types
of licence be required?
2.4 Defining “Licence”
According to the UK Licensing Framework:
“A licence is a legal document giving permission to use information” and it is considered as “a
mechanism that gives people and organisations permission to re-use information and other material
that is protected by copyright or database right. A licence should also provide clarity as to what
users and re-users are permitted to do and whether there are any restrictions on the extent of that
permission” (UK Government Licensing Framework, 2013, p.20, p.10)1.
Under the UK Licensing Framework, licences must set out clear conditions of use. For example users
must not use the information/data to mislead others, misrepresent the data or suggest that any use
that they make of the data is endorsed by a public sector body (and in particular the data source).
Davies (2012, p2)2 defines a license as setting out:
“… explicitly what someone who accesses a dataset can do with it […]and without an explicit license,
a user does not know if they have the legal permissions to share data further, to combine it with
other data, or to build a commercial service off the back of a dataset”
In a subsequent paper Davis et al (2013, p15) add that:
“Open Knowledge Definition (OKD) presents a stringent definition of an open license as one that
requires, at most, attribution of the dataset source” [emphasis added].
Davies (2010) argues that “open” in this context means that the user is free to use, re-use and
distribute the data, but with two important caveats: first the data source is always attributed and
second that the any new information created from the original data is shared with others. This has
parallels with the concept of copyleft. These precepts, like those of copyleft, may not appeal to
some users. Unsurprisingly, Davis et al note that in practice many datasets do not meet the strong
conditions of the Open Knowledge Definition.
A further nuance, noted by Davies et al, is that simple, permissive licenses are preferred because
incompatible licenses may make it difficult to combine datasets – a key requirement for mashups or
data analytics. Having presented this as a problem, Davies et al do not elaborate on what he means
by incompatible or permissive licenses or give any examples of where this has happened; he simply
uses it to underline what he considers to be the importance of permissive licenses.
1 http://www.nationalarchives.gov.uk/documents/information-management/uk-government-licensing-
framework.pdf 2 http://www.opendataimpacts.net/2012/08/ten-building-blocks-of-an-open-data-initiative/
8
The concept of permissive licencing is also discussed by Miller, Styles and Heath (2008, p4) who
present the case for this type of licence:
”…permissive licensing of data for the web means that we can all begin to move forward in lowering
the walls of our silos, releasing data to play its part in the Data Web”.
However, they do not provide any technical explanation of what permissiveness means, confining
their discussion to metaphor.
In the above discussions, an open data licence is considered to be a legal document (however brief)
containing explicit rules as to what can and cannot be done with datasets. It may be noted that,
besides Kaufman and Wagner’s steps, other importance aspects may need to be considered in
opening and maintaining data, for example publicity or public notification so that citizens are aware
of open data, what data are available and how such data can be used. In addition, they argue, it is
important to foster public innovation by encouraging creative use of these data with prizes and/or
awards.
3. An Overview of Data Licensing Approaches.
3.1 Some Open Data Licences
According to Hatcher and Waelde (2007) (cited in Davies 2012) open data providers may “create a
customized” licensing framework or “use one of the standard” open database licenses. According to
Miller, Styles and Heath (2008), the Talis Community License released the first public open data
license in 2006. Korn and Oppenheim (2011) list the following ‘standard’ licences:
1. Creative Commons Attribution;
2. Creative Commons Zero (CC0);
3. Public Domain and Dedication Licence (PDDL);
4. Open Data Commons Attribution Licence (ODC);
5. Creative Commons Attribution Share Alike (but limited interoperability);
6. Open Government Licence (OGL).
Shutzberg (2014) adds to the list above a number of open data licenses from public authorities in the
UK. In practice the emergence of many such localised licences seems probable.
Like Hatcher and Waelde, Korn and Oppenheim note that there are both standard licence and
bespoke licences. They suggest that while bespoke licences facilitate the use and potential reuse of
data, standard licences lead to better interoperability and increased user awareness of the licence
terms which in turn leads to better compliance. They also claim that bespoke licences are not
common; most licences are standard.
3.2 Two Important Open Data Licences
9
Two important general purpose licenses are the Open Database Licence (ODbL) and the Creative
Commons licence(s). The ODbL is published by the Open Knowledge Foundation. The ODbL is
particularly well suited for countries in the European Union because these countries have specific
rights that cover databases and the ODbL specifically addresses these rights. While databases can
contain different content such as text, images and videos, the ODbL does not cover contents of each
component of the database; instead it governs the rights over the database. In is therefore only a
half-solution as users have to licence the content of the database separately. ODbL specifies that:
“…any subsequent use of the database must provide attribution, an unrestricted version of the new
product must always be accessible, and any new products made using ODbL material must be
distributed using the same terms. It is the most restrictive of all ODC licenses.”3
Arguably the most important standard open data licence offered to date is the Creative Common
(CC) licence. The CC licence is the brainchild of the American academic Lawrence Lessig and uses
the same principles as copyleft. According to Chignard (2013, p2):
“Lawrence Lessig, […] is the founder of Creative Commons licenses, based on the idea of copyleft and
free dissemination of knowledge.”
Chignard claims that the CC Licence is the most widely used licence for open data and one which:
“…provides authors with a way of formalizing their legal right to offer, in effect, open access to their
work”.
According to creativecommons.org4, The Creative Commons copyright licenses and toolset provides
individuals, companies and institutions a standardized way of granting copyright permissions to use
their creative work. Data made available under the Creative Commons licenses can be distributed,
copied, edited, remixed, and built upon, all within the boundaries of copyright law.
The licence has a three layer structure illustrated in figure 1.
Figure.1 the three layers design of CC license
3 http://guides.uflib.ufl.edu/content.php?pid=32772&sid=3760010
4 https://creativecommons.org
10
The top layer is the Legal Code within which the licence is valid. The second layer is the Commons
Deed, which is sometimes referred to as the “human readable” version of the license. A feature of a
CC license is that presents information in a format that ordinary citizens can read and understand.
Most publishers and re-users (creators, educators, scientists, etc.) have no legal training and hiring
professional legal expertise is expensive – especially for individuals and voluntary or community
groups. Lastly, in order to make the Web recognize when a work is available under a Creative
Commons license, a Machine Readable version of the licence is in a third layer.
Implementing a CC Licence requires the two steps shown in figure 2. The licence has a number of
options for data use (see figure 3). One of these options is, in effect, a data equivalent of copyleft
(and even uses the backward C symbol of copyleft)
Figure.2: Steps in publishing data under Creative Commons (CC) Licence.
11
The meaning of the six options for licensing under the Creative Commons5 in figure 2 are
Attribution (CC BY). This is the most open of the licence variants. Provided the user acknowledges
the source of the data (s)he can do anything that (s)he wishes with it including adding to the data,
manipulating it and offering any derivations of it for sales. Note that this, like other CC licences,
does not stipulate any requirement for this to be free of charge.
Attribution – No Derivs (CC BY-ND). This permits the user to redistribute the data, but not to add to,
modify or manipulate it. They can charge or use it for commercial purposes provided the source
provider is credited.
Attribution-Non Commercial-ShareAlike (CC BY-NC-SA): Under this licences users can do what they
want provided they acknowledge the data provide and pass their work on to others under the same
(CC BY-NC-SA) licence.
Attribution- ShareAlike (CC BY-SA). This is the equivalent of copyleft, i.e. users may do what they
want with the data provided that they credit the provider as the source of the original data and pass
any of their own work or added data on free on the same terms as they were given the data.
Attribution-Non Commercial (CC BY-NC): This allows the user to manipulate and add to the original
data and pass this on on a non-commercial basis. Whilst the new version must acknowledge the
original; it does not have to be licensed on the same terms.
Attribution-Non Commercial-NoDerivs (CC BY-NC-ND). This is the most restrictive of the six CC
licence forms. It allows third parties to access the original data and use it with acknowledgement as
well as to share it, but they are not allowed change it or use it commercially.
A user using one of these licences is required show the type of licence with any product or service
they base on it.
Finally there is the so called Creative Commons Zero (CC0) licence. This licence has no restrictions or
requirements whatsoever, not even to attribute the source. Effectively the data suppliers waive all
of their rights (or as many as they can). Note that this is not one of the six categories offered under
the CC licence regime above.
3.3 Some European Open Data Licensing Models
Different countries have adopted different approaches to licencing. Bunakov and Jeffery (2013)
show the national Public Sector Information (PSI) portals of eight countries. This is summarised in
table 3. Each country has its own regulations for data re-use.
5 https://creativecommons.org/licenses/
12
Table 3: European governmental data portals Licences (Bunakov and Jeffery 2013)
This table hides what are more complicated and nuanced licencing regimes in a number of countries.
For example the French Licence Ourverte includes a number of features than can accompany
Creative Commons categories (figure 4). Each white box in figure represents a granular regulation
component within open data licence.
Figure 4: Regulation components of the French governmental portal for open licence Bunakov and
Jeffery (2013)
13
As noted above the UK (like the Netherlands) uses a framework (see figure 6)
Figure 5: UK Government Licensing framework
Source: http://www.nationalarchives.gov.uk/documents/information-management/uk-government-
licensing-framework.pdf (page 24)
Germany offers multiple licences for different modes of data reuse. German governmental agencies
options to choose the most appropriate licence for each case of data publishing.
Ireland is not included in Bunakov and Jeffery’s analysis. In Ireland a National Cross Industry
Working Group (2012) recommended open data providers have a licence model that clarifies the
financial side of the licence. They also recommended that an ad hoc open data licence be created
specifically for Ireland.
3.4 Conformant and Non Conformant Licences.
The Open Definition Organization6 categorises licenses into Conformant and Non-Conformant.
Conformant means that they conform to the principles set forth in the Open Definition. Conformant
Licenses are classified as Recommended, Non-reusable, Little Used, Discontinued or Deprecated as
shown in Tables 4, 5, and 67. Table 7 shows a list of non-conformant licences and table 8
discontinued licences.
6 opendefinition.org/licenses/
7 http://opendefinition.org/licenses/
14
Licence
Domain By SA Comments
Creative Commons CCZero (CC0) Content, Data
N N Dedicate to the Public Domain (all rights waived)
Open Data Commons Public Domain Dedication and Licence (PDDL)
Data N N Dedicate to the Public Domain (all rights waived
Creative Commons Attribution 4.0 (CC-BY-4.0)
Content, Data
Y N
Creative Commons Attribution (CC-BY)
Content Y N All versions 1.0-3.0, including jurisdiction “ports”
Open Data Commons Attribution License (ODC-BY)
Data Y N Attribution for data(bases)
Creative Commons Attribution Share-Alike 4.0 (CC-BY-SA-4.0)
Content, Data
Y Y
Creative Commons Attribution Share-Alike (CC-BY-SA) -
Content Y Y All versions 2.0-3.0, including jurisdiction “ports”; version 1.0 is little used and not recommended because it is incompatible with future versions.
Open Data Commons Open Database License (ODbL)
Data Y Y Attribution-ShareAlike for data(bases)
Free Art License (FAL)
Content Y Y
Table 4: Conformant Recommended Licenses
Licence
Domain By SA Comments
UK Open Government Licence 2.0 (OGL-UK-2.0)
Content, Data
Y N For use by UK government licensors; re-uses of OGL-UK-2.0 material may be released under CC-BY or ODC-BY. Note version 1.0 is not approved as conformant
Open Government Licence – Canada 2.0 (OGL-Canada-2.0)
Content, Data
Y N For use by Canada government licensors. Note version 1.0 is not approved as conformant
Table 5: Conformant Non-reusable or Little Used Licenses
Licence
Domain By SA Comments
GNU Free Documentation License (GNU FDL
Y Y Only conformant subject to certain provisos
MirOS License
Code, Data
Y N Little used
Talis Community License
Data ? ? Deprecated in favour of ODC licenses
Against DRM
Content Y Y Little used
Design Science License
Data Y Y Little used
EFF Open Audio License
Content Y Y Deprecated in favour of CC-BY-SA
Table 6: Conformant but Deprecated Licenses
15
License name Comments
Creative Commons No-Derivatives Licenses
Creative Commons No-Derivatives (by-nd-*) violates principle 3., “Reuse”, as they do not allow works, in part or in whole, to be re-used in derivative works.
Creative Commons Non-Commercial
Creative Commons Non-commercial licenses (by-nc-*) do not support the Open Knowledge Definition principle 8, “No Discrimination Against Fields of Endeavor”, as they exclude usage in commercial activities.
Project Gutenberg License
Used on Gutenberg’s ebooks of public domain texts. It is non-open because it restricts commercial use. Note that the license only applies if the user continues to use the Gutenberg name – if you remove the licensing information and any reference to Project Gutenberg then the resulting text is open.
Table 7: Non-conformant Licenses
License name Comments
Creative Commons Developing Nations License
The license has been discontinued. Creative Commons developing nations license does not support principle “7. No Discrimination Against Persons or Groups”.
Open Publication License
Discontinued in favour of Creative Commons. In late 2004 the site was overhauled and turned into a portal to open academic content. In August 2007, David Wiley, the author of open content launched the draft Open Education License. License is not conformant if either options A or B are added to the main body of the license. Option A prohibits ‘substantive modification’ and option B prohibits commercial use of printed copies.
UK PSI (Public Sector Information) Click-Use Licence
Formerly used for a variety of material produced by UK central and local government. This license is not open.
Table 8: Discontinued Licenses
Table 9 summarises the position in relation to a number of important licences drawn from Korn and
Oppenheim (2011, p6) and Halonen (2013, p.61).
16
Licence Type
Who can use the resource and under what terms?
Can the licensed data be modified? Suitability for data, datasets and databases
Creative Commons:
Attribution (CC-BY) Anyone YES, but you must attribute. You must also ensure that you do not impose any restrictions on the whole of the work licensed beyond the terms of this licence.
Not specifically geared towards data, datasets and databases, but can be used with minimal amounts of data (to avoid attribution stacking) and as long as only an “insubstantial” amount of any databases or datasets are reused.
Attribution Share Alike (BY-SA)
Anyone YES, but you must attribute and if you use or reuse the data etc., you must use the CC BY SA end user licence for onward licensing.
As above. Share Alike requirement can impact negatively on interoperability of data and prevent linked open data.
Attribution Non-Commercial (BY-NC)
Anyone – for non-commercial purposes only
YES, but you must attribute. As above. Although NC restriction does not pose immediate problems, but ambiguity of what constitutes non-commercial may be problematic. There may also be interoperability problems with linking to data licensed under more permissive terms.
Attribution No Derivatives (BY-ND)
Anyone NO and you must attribute. As above. Reuse and repurposing of data, datasets and databases not permitted.
Attribution Non-Commercial Share Alike (BY-NC-SA)
Anyone – for non-commercial purposes only
YES, but you must attribute and if you use or reuse the data etc., you must use the CC BY SA end user licence for onward licensing.
As above. Share Alike requirement can impact negatively on interoperability of data and prevent linked open data. Although NC restriction does not pose immediate problems, but ambiguity of what constitutes non-commercial may be problematic. There may also be interoperability problems with linking to data licensed under more permissive terms.
Attribution Non-Commercial No Derivatives (BY-NC-ND)
Anyone – for non-commercial purposes only
NO and you must attribute
As above. Reuse and repurposing of data, datasets and databases not permitted. Although NC restriction does not pose immediate problems, but ambiguity of what constitutes non-commercial may be problematic. There may also be interoperability problems with linking to data licensed under more permissive terms.
Creative Commons Zero
Anyone YES, with no restrictions whatsoever. Ideal.
Open Data Commons Open Database Licence
Anyone YES but you must attribute any public use of the database, or works produced from the database, in the manner specified in the ODbL. For any use or redistribution of the database, or works produced from it, you must make clear to others the license of the database and keep intact any notices on the original database. Share-Alike: If you publicly use any adapted version of this database, or works produced from an adapted database, you must also offer that adapted database under the ODbL.
Ideal – although there may be some attribution requirements, leading to possible attribution stacking and also interoperability issues associated with the Share Alike requirement.
Open Data Commons Attribution Licence
Anyone (applies to data, datasets and databases)
Yes – but you must attribute any public use of the database, or works produced from the database, in the manner specified in the ODbL. For any use or redistribution of the database, or works produced from it, you must draw third parties’ attention to the original licence of the database and keep intact any notices on the original database.
Ideal – although there may be some attribution requirements, leading to possible attribution stacking.
Public Domain and Dedication Licence
Anyone (applies to databases)
YES, with no restrictions whatsoever Ideal.
Open Government Licence
16
Anyone (applies to content, data, databases and source code)
YES, but you must attribute. Can be used with minimal amounts of data (to avoid attribution stacking).
17
4. Reflections and Critique
4.1 Principles versus practicalities
In December 2007 a group of open government advocates held a meeting in Sebastopol, California
to discuss OGD. From this emerged a set of eight principles (as well as some sub principles which are
not reproduced here). They argued that open government data should be:
1. Complete. All public data that is not subject to valid privacy, security or privilege limitations
should be available.
2. Primary. Data should be made available at as low a level of detail as is available, not just in
aggregated or summary form.
3. Timely: Data should be released as quickly as possible.
4. Accessible: Data should be available to as wide a range of users as possible and for as wide a
range of purposes as possible;
5. Machine processable:
6. Non-discriminatory: There should be no requirement to register or provide personal
information in order to obtain data.
7. Non-proprietary: It should not be possible for anybody to acquire any kind of proprietary
right or technical control over public data
8. License-free: Data should not be subject to any copyright, patent, trademark or trade secret
regulation.
The group proposed that reasonable privacy, security and privilege restrictions “can be allowed”, but
they did not pursue the implications of this in any detail. This is quite a libertarian manifesto and
while it raises some difficult questions, it provides a useful baseline against which to discuss issues in
OGD licensing.
4.2 Ownership rights in data?
Copyright in data is a complicated matter. Schutzberg (2014) declares that, unlike the Open Source
initiative which keeps a list of licenses that follow open source principles, there is no equivalent
process involved for open data licences. According to Miller, Styles and Heath (2008, p2):
“Copyright protection applies to acts of creativity and categorically does not extend either to
databases [or] to those non-creative parts of their content.”
(emphasis added). They go on to argue that data should be open by default and that there is no
need for copyright because copyright and related forms of protection are only for creative work.
Unfortunately this assertion is misleading and incorrect. First the law recognises a sui generis right
for databases which is specifically designed to recognise the cost of compiling such a database. In
the EU, this right falls under Directive 96/9/EU of the European Parliament and Council. Secondly,
certain types of data may be copyright. Under Article three of the EU directive, databases which:
18
"…by reason of the selection or arrangement of their contents, constitute the author's own
intellectual creation"
are protected by copyright. So while it may not be possible to copyright a number, it is quite
possible to copyright a photograph.
This raises potential difficulties for OGD licences. Consider a situation where a professional
photographer takes a picture of a building for use in (say) a state publication on historic public
buildings. Does the photographer or the state retain copyright in that picture? Arguably the answer
could be yes. What then is the position if this photograph is embedded in some other document
containing data which is clearly not copyright (such as viewing hours for the building in question)?
How does one deal with copyright in this situation? This problem is not insoluble, but solutions may
be messy or expensive to implement.
While, therefore, it can be argued that where the data is simply collected either directly (as in a
census or survey) or as a by-product of another process (such as making a passport application)
there is little or no creativity involved and therefore no copyright this may not be true when other
forms of data, such as sound and pictures or video, are concerned. As Davies, Perini, and Alonso
(2013, p.15) note, combining different data sets can create much value out of open data and that
this in turn may create:
“…significant challenges in determining the legal status of derivative datasets”.
In short, this is a complex area and it is far from clear that current licence regimes deal with it
adequately.
4.3 Should all OGD be free?
This leads directly to the question of whether OGD should be free of charge. The various open data
licences discussed in section three are primarily concerned with ownership rights and about rights of
usage and acknowledgement rather than payment. Ownership of data can have a number of
meanings. One meaning is that the data is, in some sense, owned by the public. If somebody goes
to trouble and expense to compile data that is in the public sphere (to take a trivial example, the
number of non-pay car parking spaces available in different locations in a city) does that person
‘own’ these data? Current law suggests that they have some property rights in this situation. If a
private individual or organization collects such data, it is under no obligation to give it away for free;
many companies collect and sell such publically available data on a routine basis. The company
cannot not claim copyright in the base data, anybody else is free to collect the same data, but it can
claim property rights in the compiled data. If, on the other hand, the state collects the data then,
the argument runs, since the costs of collecting these data have been borne by the taxpayer the
taxpayer owns it and is entitled to free access to it.
This is a common argument, but can be questioned on the grounds that there are plenty of examples
of taxpayer funded resources, ranging from tolled roads to national parks, for which those who
actually use or directly benefit from those resources pay a charge, even if the charge is a small
19
fraction of the full economic cost of provision. Where a resource or service that benefits a specific
subgroup of society is funded by general taxation, then it may well be appropriate that the
beneficiaries makes an additional contribution to that cost. In the case of OGD, suppose a
commercial organisation can use government collected parking space data to generate a profit (say
by creating an App identifying where such spaces are available); this will not be of much benefit
taxpayers who do not own cars. Why then, should the government not try to recompense all
taxpayers by making those who may benefit from using the data pay to use it?
4.4 Privacy risks?
At first glance, the question of privacy would appear to have no implications for licensing of OGD as
such. Private data should not be released in the first place so issues of personal privacy should not
arise or need subsequent legal protection. Of course there is the question of determining what data
should be released, but that upstream of licensing. Once again, practice is not this clear cut.
Public ownership of data was discussed above. A second type of data ownership is personal. How
much of a citizen’s data is legitimately in the public domain even (where possible) in an anonymised
form? Name and address may be public information, but what about social security or personal
identity number? What about personal tax returns, health records, details about minor infractions
of the law, social welfare receipts, driving license information, passport number? As noted, in
general both principles and licences do not deal with the question of privacy on the presumption
that private data will not or should be not be released in the first place and therefore need not be
protected in licences. But, as recent events have demonstrated, seemingly ‘non private’ data can be
exploited to target individuals for commercial purposes by companies as such as Google and
Facebook (and they are the visible face of this industry; there are far less visible and more worrying
entities at work in the data mining business) and in so doing have impacted adversely on what
people perceive as their privacy.
This capability comes from data analytics and the power of modern technology driven inference. As
a result, from a privacy perspective, it is no longer valid to assume that once data is anonymised or
even when it is aggregated, that individuals are safe from threats to their privacy. As increasing
amounts of data are combined from multiple public and private sector sources, deanonymisation
become a greater risk. The risks from this are not just that citizens are targeted with unsolicited
‘personalised’ advertisements, but of more ominous impacts like higher insurance premiums,
difficulties in obtaining credit and other forms of discrimination.
4.5 Legal risks to the taxpayer?
One of the characteristics of FoI as a form of OGD has been its use to embarrass politicians and
public servants (Worthy 2010; Grimmelikhuijsen 2010). While this might be considered healthy in a
democracy, it suggests that there may also be risks to the taxpayer arising from OGD. One source
might be incorrect data. Consider an example where a government agencies dealing with social
welfare has a data on a computer that says a particular citizen is believed to be a fraud risk, but that
these data are incorrect. As long as such data is confidential and restricted to (say) case officers,
there is limited legal risk to the state if only because it is unlikely that the citizen concerned will ever
20
find this out. However, were such data to get into the public domain it would constitute libel and a
citizen might justifiably sue the state. A further problem is downstream impacts. According to
Davies , Perini, and Alonso (2013, p.15) combining different data sets can create much value of open
data, yet it is argued that can create:
“significant challenges in determining the legal status of derivative datasets”.
Unless licences are well designed and bullet proof, it is not difficult to envisage the state becoming
embroiled in legal disputes about rights and ownership.
4.6 Problems with Existing Licences?
Science Commons and Creative Commons8 make a number of criticisms of existing licences. They
state that there are many objectives of sharing data including:
Reducing unnecessary transaction costs,
Simplifying legal tools and
Providing clarity and certainty of (provider and re-user) rights
They go on to suggest that the Open Database License (ODbL) fails to attain these objectives for
sharing data publicly for several reasons including:
“ODbL fails to promote legal predictability and certainty over the use of databases.
ODbL is complex and difficult for non-lawyer to understand and apply.
ODbL can result in high transaction costs on the data sharing community.
ODbL imposes contractual obligations even in the absence of Copyright.”
They propose using other licences including the CC0 which they consider to be more consistent and
simple than ODbL. In addition, they suggest using other public domain dedications or copyright
waiver. An obvious question is whether governments would (or even could) grant a copyright
waiver for such data.
Furthermore, although, Open Government Licence covers many aspects of re-using data, it does not
cover several other critical aspects including9:
“Personal data in the Information;
Information that has neither been published nor disclosed under information access legislation
(including the Freedom of Information Acts for the UK and Scotland) by or with the consent of the
Information Provider;
Departmental or public sector organisation logos, crests and the Royal Arms except where they
form an integral part of a document or dataset;
Military insignia;
8 http://sciencecommons.org/resources/readingroom/comments-on-odbl/ accessed on 03/03/2014
9 ibid
21
Third party rights the Information Provider is not authorised to license;
Other intellectual property rights, including patents, trademarks, and design rights; and identity
documents such as the British Passport”
A more extreme view is expressed by Miller, Styles and Heath (2008) who express doubts about the
potential benefits of open data licensing at all. They suspect that licensing could hinder the process
of opening up data and may even discourage the re use of data.
4.7 Problems of Licence Design?
Ubaldi (2012, p37) comments that:
“FOI and PSI legislation as well as clear licensing guidelines are a cornerstone of OGD”
He emphasises three important prerequisites in order to be able to publish open data namely:
The presence of a Freedom of Information concept;
Good Public Sector Information legislation and
Clear guidelines of open data licenses
Tin practice the first two of the above may not always be there and the third is easier to state than
to deliver. For example licences may need to be tailored to different types of user. A licence for
business or commercial use might be different from that for research use or use by non-profits or
state agencies. Even the geographical nature of data usage might be dealt with differently, in terms
of global, international, national and local usage. Sharing data in a small community is different
from sharing data between countries. Consequently, different terms and conditions may apply to
both types of user and geographical location of usage. There are already political debates about
storage of data outside of the jurisdiction in which users reside (Kandukuri et al 2009; Kertesz and
Varadi 2014). Can a licence be enforced once data has been moved to a different polity?
Developing a unified legal framework for open datasets is seen as an important issue to resolve as
data increasingly travels across and is stored beyond national boundaries in jurisdictions where
different rules rights may apply. Halonen (2012) argues that there is a need for internationalisation
of open data, in which licensing is a major determinant. She argues that data must be licensed
under a licence that recognizes the users’ rights to take advantage of data in a range of ways
including commercial data/right/purpose though her arguments for this are rhetorical and
normative.
4.8 Policing Compliance
Finally, having established a licencing regime, the regime needs, like any other regulatory system, to
be policed. Problems that might arise are complaints about misuse or misleading use of data,
privacy breaches, disputes over the right to charge for added value, disputes over copyright and
possibly even more arcane matters such as the right to be forgotten (which, as is currently becoming
evident, in itself opens a whole host of legal issues the consequences of which are still being worked
out).
22
It is possible, though it seems improbable, that governments will simply be able to release data
subject to a licence and then walk away. In addition to offices for Data Protection and FoI,
government are likely find that they need an Open Government Data Commissioner and
accompanying bureaucracy. This remains to be seen. The importance of licences may also vary
based on the re-use of data and this may need regulatory control. For example, if a company plans
to redistribute the data, then arbitrators may need to be involved to make sure that the
authorization is given to re-use the data. In contrast, less concern regarding the licence will be
accrued if a citizen plan to re-use published data because the citizens may assume the provision of
the data online gives them right to re-use published data. However, in both cases, the need of clear
and simple licence is important even where the data are already available publicly.
5 Summary and Some Concluding Thoughts
The purpose of this paper has been to unpack some of the issues surrounding OGD licensing. The
libertarian view that public data, bar some exceptions for security, privacy, etc., should open and
free, in both senses of the latter, is, as has been shown, is faced with many potential legal pitfalls
and may not be politically realistic. The problems can be summarised under eight broad headings:
First: what data is to be released? As noted, this would not appear initially to be a licencing
problem. However the question of what happens when data that was thought to be sanitised
from a security, privacy or commercial sensitivity perspective turns out not to be so? Can
licences anticipate and contain such risks?
Second: there is the question of return for the taxpayer on money invested in data acquisition.
Should the beneficiary pay?
Third: directly related to the preceding point there is the question of acquired property rights. If
private companies can acquire this right, states can.
Fourth: contrary to popular perception, copyright can exist in OGD particularly in the form of
visual material, but also potentially in other forms of data and meta data.
Fifth: there is the question of consequential legal problems for the state arising from data
release and whether licences can be designed which can forestall this (and it is far from evident
that this is the case).
Sixth: there is the problem of control of the use data once it moves beyond the jurisdiction of
the issuing government.
Seventh: there are questions of differential licencing for different types of user and the problems
of containing restricted licences.
23
Finally there is the matter of establishing a regulatory infrastructure to manage and police this
process.
The rising demand for OGD is not primarily about transparency – it is about releasing and creating
value. Already there is a rapidly growing number of products and services available which are based
on such data. Human ingenuity and creativity being what it is, there are undoubtedly numerous
good things that will emerge from the release of such data in years to come. In such circumstances,
it is easy to understand why so many people believe that licences only get in the way. This
examination suggests that festine lente might be a better motto. There are several benefits of
licensing, not the least of which is providing users with a certainty about where they stand. There is
also the lurking worry about Donald Rumsfeld’s famous ‘unknown unknowns’: the possible
consequences of analytics, data mining, mash ups, machine learning and other emergent
technologies.
One of the most closely guarded secrets of the modern era is how to construct a nuclear weapon. In
1976 a 21 year old student names John Phillips, while still an undergraduate student at Princeton
and working from published materials including textbooks, produced an outline design for a world-
war two type nuclear bomb (Rein 1976). There is some controversy as to whether Phillips’ design
would have worked, though some nuclear engineers thought that it could have been used to
construct an operational weapon. This was done long before the invention of the Web and in an
area where massive state security and protection of critical data was involved. The moral is that one
can never be sure what data will yield to the intelligent mind. Consequently, when it comes to OGD
licensing it might be well to remember the words of former US president Theodore Roosevelt that
governments should “speak softly, but carry a big stick”.
References
Barry, E. and F. Bannister (2014) Barriers to Open Data Release; A view from the top, Information
Polity, 19(1/2), 129-152.
Bertot, J.C., P.T. Jaeger, J. M. Grimes (2010) Using ICTs to create a culture of transparency: E-
government and social media as openness and anti-corruption tools for societies, Government
Information Quarterly, 27(3), 264-271.
Bunakov, V. & Jeffery, K. Licence management for Public Sector Information (2013), Parychek, P and
N. Edelmann (Eds.) CEDEM, Proceeding of the Conference for E-Democracy and Open Governement,
277-288.
Chignard, S. (2013). A Brief History of Open Data. ParisTech Review. Available at:
http://www.paristechreview.com/2013/03/29/brief-history-open-data/
Davies, T. (2012) Ten Building Blocks of an Open Data Initiative. Available at:
http://www.opendataimpacts.net/wp-content/uploads/2012/08/Ten-Building-Blocks-of-an-Open-
Data-Initiative.pdf
24
Davies, T. (2013) Fernando Perini, F. Alonso, J. 2013. Researching the emerging impacts of open
data ODDC conceptual framework. Available at:
http://www.opendataresearch.org/sites/default/files/posts/Researching%20the%20emerging%20im
pacts%20of%20open%20data.pdf
DLA Piper (2013) Data Protection Laws of the World, DLA Piper. Available at:
http://files.dlapiper.com/files/Uploads/Documents/Data_Protection_Laws_of_the_World_2013.pdf.
Free Software Foundation (2014) A Quick Guide to GPLv3. Available at:
http://www.gnu.org/licenses/quick-guide-gplv3.html
Grimmelikhuijsen, S. (2010) Do transparent government agencies strengthen trust?, Information
Polity, 14(3), 173-176.
Halonen, A. (2012). Being open about data. Analysis of the UK open data policies and applicability of
data. Available at:
http://finnish-institute.org.uk/images/stories/pdf2012/being%20open%20about%20data.pdf.
Janssen, M., Y. Charalabidis and A. Zuiderwijk (2012) Benefits, Adoption Barriers and Myths of Open
Data and Open Government, Information Systems Management, 29(4), 258-268
Joinup (2014) European Union Public Licence. Available at:
https://joinup.ec.europa.eu/software/page/eupl
Kertesz, A. and S. Varadi (2014) Legal Aspects of Data Protection in Cloud Federations, in Nepal, S.
and M. Pathan (Eds.) Security, Privacy and Trust in Cloud Systems, Berlin, Springer-Verlag.
Korn, N. & Oppenheim, C. (2011) Licensing open data: a practical guide (version 2.0). Hefce: JISC,
junio.
Miller, P., Styles, R. & Heath, T. (2008) Open data commons, a license for open data. Proceedings of
the 1st Workshop about Linked Data on the Web (LDOW2008).
Nicol, A., Caruso, J. & Archambault, É. (2013) Open Data Access Policies and Strategies in the
European Research Area and Beyond, info@ science, 1, 495.6505.
Open Source Initiative (2014) The Open Source Definition (Annotated). Available at:
http://opensource.org/osd-annotated.
Rein, R. K. (1976) A Princeton Tiger Designs An Atomic Bomb in a Physics Class, People, 6(17),
October 25th 1976. Available on line at:
www.people.com/people/archive/article/0,,20067027,00.html
Privacy International (2014) Data Protection and Privacy. Available at:
https://www.privacyinternational.org/issues/data-protection-and-privacy-laws
25
Schmitz, P-E. (2013) The European Union Public Licence (EUPL), International Free and Open Source
Software Law Review, 5(2), 121-136.
Schutzberg, A. (2014) Nine Things You Need to Know about Open Data [Online]. Available:
http://www.directionsmag.com/articles/nine-things-you-need-to-know-about-open-data/385680
[Accessed 25/02/2014.
Ubaldi, B. (2013) “Open Government Data: Towards Empirical Analysis of Open Government Data
Initiatives”, OECD Working Papers on Public Governance, No. 22, OECD Publishing.
http://dx.doi.org/10.1787/5k46bj4f03s7-en
Wimmer, M., Scholl, J., Janssen, M. & Traunmüller, R. (2013) Electronic Government. Proceedings
of Ongoing Research, General Development Issues and Projects of EGOV, Berlin, Springer-Verlag.
Worthy, B. (2010) More Open but Not More Trusted? The Effect of the Freedom of Information Act
2000 on the United Kingdom Central Government. Governance, 23 (4), 561–582.