Permanent access to digital knowledge – the challenges for digital preservation Pat Manson Head of Unit European Commission DG Information Society and.

  • Permanent access to digital knowledge the challenges for digital preservation

    Pat MansonHead of UnitEuropean CommissionDG Information Society and MediaCultural Heritage and Technology Enhanced Learning

  • OutlineThe changing digital landscapeHow this is re-shaping our view of preserving knowledge for the futureWhat are the main challengesWhy is the Commission concerned? And what are we doing?

  • Context the changing digital landscapeIncreasingly diverse and varied pictureexpanding context has implications for organisational responsibilities & infrastructures and for technical solutionsMain characteristicsExponential growthComplexity of content joined up approaches to published text and underlying dataCollaborative and dispersed environmentsInstability, dynamic, short-livedThrust of ICT industry to better tools & means, bringing rapid changeNew methods of communication and dissemination

  • Digital preservation landscapeThese drivers have implications for how we view the preservation problem:Shift from tackling preservation at end of creation life-cycle to embedding preservation features increasingly early in the creation processInvolvement of new stakeholders, in public and private sector with implications for organisational and infrastructural approachesNeed to monitor change in documents and content, giving rise to issues such as version control, authenticityContext of information retrace information paths to reconstitue accurate record as well as enabling use by others than the originator

  • Challenges LegalCopyrightDeposit strategies for highly distributed content, linked cross-bordersFinancialCosts of maintaining digital archives and repositoriesOrganisationalRisk of widely divergent approaches and duplication of effortEnsuring complementarities and support best practicesTrust, quality of repositories and how to translate that into durable infrastructuresTechnicalImprove cost efficiency and affordabilityUnderstanding how to preserve high-volume rapidly changing contentAnticipating future contexts of use Standards and support for interoperability

  • Why is digital preservation a problem to be tackled at European level?Scale and dimension of the problem is intrinsically trans-national as is scope of the content we are dealing withRelates to our existing responsibilitiesRegarding existing legislation (especially copyright and IPR)Research under FP7Consolidate and optimise approaches for cost-effectivenessRemove fragmentationAvoid redundancy, where it is not efficient

  • What the Commission is doingDirect actions by the Commission and leveraging actions for other stakeholders the Member States and organisationsWorking on policy, strategic, and technical levelsMarriage of mandated top-down responsibilities (through politically-driven actions) and bottom-up stakeholder driven initiatives

  • What the Commission is doingOn policy leveli2010 digital libraries: digitisation and online accessibility of cultural material and digital preservationCommunication followed by Recommendation of the Commission of 24 August 2006, endorsed by Conclusions of Council November 2006Communication on Scientific information in the digital age: access, dissemination and preservationThe strategic levelOn strategic levelHigh Level Group with sub groups on copyright and scientific information, and work on public private partnershipsTaking forward recommendations to address framework conditionsMember States Expert Group on Digitisation and Digital PreservationMonitor progress, forum for cooperation with MS and Commission, support exchange of information and good practices

  • Council Conclusions following up the Commission RecommendationCouncil Conclusions leveraging key actions for digital preservation in the Member Statesestablishing national strategies for long term preservation and deposit by mid 2008Starting in 2007, to developing quantitative and qualitative targets including the associated financial planning on a multiannual basis for deposit, digitisation and online access of cultural material and long-term preservationBy 2009, established legislative or other effective framework in support of digital preservation (with provision for legally mandated deposit institutions, web-harvesting, multiple copying and migration)Commission is supporting this, eg by Call for tender for a study on Socio-economic drivers and impact of longer-term digital preservation

  • What the Commission is doing technical and operational levelResearchFP7 has digital preservation as clear research topic continues research lines started in 2005/6

    Evolutionary scenarioComplete the current portfolio of projects addressing the topic of digital curation and preservationExtend stakeholder communities involvedStructure research - extend cross disciplinarityCentres of competence services, knowledge, outreach and support for emerging infrastructuresNew approachesExplore possibilities offered by new ICT for new approaches to digital preservation Re-think approaches and concepts

  • Research and impacts expectedImpact in the short / medium term:Systems and tools supporting key digital preservation functions, based in first instance on OAISInfrastructure issues: registries, certification, authentication, accreditationAwareness of the broader scope of the problem

  • Research and impacts expectedImpact in the medium termThrough research aimed at addressing key technological problemsMore cost effective approaches for ingestion and characterisationScalability of computing and storage resources (distributed architectures)Heterogeneity (formats, platforms, objects, data semantics) across space and timeIncreased capacity of support infrastructures dealing with registries, certification, authentication and accreditation

  • Research and impacts expectedImpact in the longer term: developments in digital technologies are likely to enable new approaches to digital preservation:How to know what needs to be preservedHow to deal with high volumes, dynamic volatile contentModels for digital objects capable of supporting self-preservation featuresAnticipate the context of future access and re-use of preserved informationAbility to preserve not only data but context of meaning and useAssure integrity, authenticity and accessibility

  • To concludeProgress to dateRecognition of the problem by larger and more diverse group of stakeholdersImproved political awareness and willNeed for pro-active strategiesOrganisational infrastructures will mix top-down national responsibilities with bottom-up stakeholder driven approachesButStill need to answer the questions of how to handle the new forms of digital content or risk losing more scientific knowledge because we have not developed the technical capabilities to preserve it

    INTRODUCTIONOver the years we have numerous examples of how the loss of knowledge has impacted on on society, on quality of life and on the ability to make scientific and economic progress. One of the most dramatic is the transition from the Roman Empire to the Dark Middle Ages. Since the mid-18th century, society has developed mechanisms to ensure that scientific knowledge and I use scientific in the wide sense of all intellectual knowledge is collected, archived and collectively maintained for the future. This has involved learned societies, legal deposit, has seen the emergence of national libraries and in the twentieth century extended the infrastructures to grey literature, such as theses. These, predominantly national infrastructures, usually had built-in fail safe mechanisms, mainly based on redundancy through the collecting multiple copies. So until now we have been reasonably sure about the survival of knowledge, and about our ability to retrieve it even if this retrieval was not immediate.However, we are here today, because we recognise that in the digital environment, we are confronted with a set of problems and challenges that will test our ability to provide for users in the future the knowledge we are creating today.Permanent access to the records of science why is this such an important and yet in today's digital age such a difficult topic? In this short presentation, I am going to be focusing on the word permanent. This gives rise to the real problem: namely that of digital preservation, and providing continuity of access over time to the information that is deposited in repositories, libraries, archives. It is a problem which has to embrace long term thinking, but where there is also a need for immediate and short term solutions.There is increasing, even exponential, growth in the volumes of information being produced in digital format and online. At the same time, past practice of having parallel paper products is changing, leading to estimates that by 2020 90% of information will only be produced in digital form. Increase in volume is one factor, but it is accompanied by an increase in the complexity of digital documents text, including links to images, data-sets requiring access to different software for representing and displaying them. So not only is "simple" published text now integrating images, algorithms, formulae, we are moving to a point where users will require this text to be joined up with the underlying data sets so that researchers can re-analyse and use these data, in contexts that may be different from those envisaged by the originator of the data.A further characteristic of this digital world is that we are increasingly working in collaborative and highly dispersed environments. The context is also one where data and the resources are highly distributed, and where files and data are constantly updated. These are all introducing new levels of complexity in the content we are being asked to manage. This content is volatile and short-lived the average life of a web page is less than that of a house fly. Recent studies have shown that 3 years after publication, Internet references quoted in three prestigious journals were no longer retrievable. rapidity of change the thrust of the ICT industry to provide better tools and means is driving this, with the "time to obsolescence" for software and technology becoming ever shorter. And of course, this is accompanied by new methods of communication and dissemination, with the trends towards decentralisation democratisation, even, in some contexts of the dissemination processes, either through informal online communities (specific to particular disciplines and groups of researchers) or more formal or instutionalised ones, that we see in the institutional repositories.This is giving rise to new organisational issues, which challenge both the sustainability and the current economic/financial underpinnings of our existing infrastructures and business models.I must add here that the problem is not one confined to the research and scientific community. Digital content, sharing many of the characteristics of that produced by the scientific community, is created and forms part of the knowledge assets of businesses, and is itself the source of new content industries. Economic and also legal factors are likely to play an increasing role in raising the profile of digital preservation, as we build up experiences of the risks associated with our inability to keep our digital information accessible.These drivers have implications for shaping how we view the wider digital preservation problem and these echo many of the issues set out in the background document prepared by the Alliance.First, that there is a shift from tackling preservation at the end of the creation life-cycle, as we mostly do today, to providing preservation information and features increasingly early in the creation process. This implies the need for new tools, but also puts responsibility on to the content creator as well as on the archiving organisation.Involvement of new stakeholders, in public and private sector with implications for organisational and infrastructural approaches.Need to monitor change in documents and content, giving rise to issues such as version control, authenticity and this applies also to the repositories as a whole, and not just the individual objects.Context of information our need to re-trace information paths in order to reconstitute the accurate record, often with legal implications. But the context is also an essential ingredient to enable users, other than the originator, to understand and re-use data in the future.There are also potentially significant economic considerations, based on the current and future value of the knowledge and information. This gives rise to challenges that are of a legal, financial and technical/organisational nature.On the legal front, as digital preservation depends on copying and migration, it has to be considered in the light of IPR legislation. The diverging speed and scope of the legal measures adopted by the Member States, for example for legal deposit, could lead to a patchwork of different rules affecting content producers with cross border activities. The intersection between legal deposit and IPRs, the introduction of technological protection measures to prevent copying, or of digital rights management systems restricting the access to digital material raises a whole set of new issues.Financial. The real costs of long-term digital preservation are not clear or frankly unknown. However, we know that to some extent they will depend on factors such as the number of migrations needed over time. It is, however, likely that due to the limited resources available, choices will have to be made as to which material should be preserved.Organisational challenges: Choices are necessary, but who decides and who is responsible for preserving what? In an area where some basic questions are far from being answered, there is a risk of widely divergent approaches and duplication of efforts. European added value can be found in ensuring complementarities and an exchange of good practices. I would like to mention here certification, which can be both a technical and an organisational challenge for eventual infrastructures. Recent work has seen progress on tools for self-certification (promoting best practices) but the results so far have showed a limited number of repositories actually compliant with the toolkit. This remains an area of critical importance to foster trust and confidence of the users in the repositories both in terms of those depositing content and those accessing it.Technical challenges: So far, limited research has been done on digital preservation. A major challenge is to improve its cost-efficiency and affordability. Advancing understanding of how to preserve high volumes of rapidly changing distributed information is another essential area to be addressed. Other technical issues how to anticipate future contexts of use, and then to provide the contextual information (metadata, semantics) needed to support that as yet undefined use is a real research problem. Standards in support of...


