Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
PRESERVATION OF ELECTRONIC LEGAL
MATERIALS
UELMAPRESERVATIONGROUP
LeahPrescott,chairStevenAnderson
ErikR.BeckDianeBoyer‐VineDanielCordova
SusanDavidDeMaineAmyEmersonEmilyFeltrenDavidGreisenDavidHansen
JasonJudtJaneLarringtonMargaretMaesMichellePearseMendoraServinAnthonySmithLeslieStreetDavidWalls
Rev.4/2018
1
Contents IntroductionandBriefBackground............................................................................................................................................................2
Formationofadhocpreservationgroup.............................................................................................................................................2
Survey.................................................................................................................................................................................................................2
UELMAandElectronicLegalMaterial.......................................................................................................................................................3
DigitalPreservation...........................................................................................................................................................................................4
OAIS(OpenArchivalInformationSystem)FunctionalModel‐ISO14721............................................................................4
TrustedDigitalRepositories.....................................................................................................................................................................4
LevelsofPreservation.................................................................................................................................................................................5
DocumentStrategies.........................................................................................................................................................................................8
PDFStrategies.................................................................................................................................................................................................8
XMLStrategies................................................................................................................................................................................................8
MetadataforDigitalPreservation.............................................................................................................................................................10
PREMIS.............................................................................................................................................................................................................10
METS.................................................................................................................................................................................................................11
DigitalStorage....................................................................................................................................................................................................12
CloudStorage................................................................................................................................................................................................12
Amazon.......................................................................................................................................................................................................12
GoogleCloud.............................................................................................................................................................................................12
Localdigitalstorage....................................................................................................................................................................................12
CaseStudies........................................................................................................................................................................................................14
California.......................................................................................................................................................................................................14
Minnesota......................................................................................................................................................................................................19
Washington,D.C.........................................................................................................................................................................................22
AppendixI:SurveyResults...........................................................................................................................................................................27
AppendixII:Opensourceandcommercialpreservationsystems..............................................................................................32
Archive‐It(www.archive‐it.org)............................................................................................................................................................32
Archivematica(https://wiki.archivematica.org)...........................................................................................................................33
Arkivum(www.arkivum.com)...............................................................................................................................................................33
DuraspaceSystems(www.duraspace.org).......................................................................................................................................33
Islandora(islandora.ca)............................................................................................................................................................................34
Perma.cc(https://perma.cc)..................................................................................................................................................................35
Preservica(www.preservica.com).......................................................................................................................................................35
Rosetta(www.exlibrisgroup.com/category/RosettaOverview).............................................................................................35
Samvera(samvera.org).............................................................................................................................................................................35
AppendixIII:Stand‐alonePreservationTools.....................................................................................................................................37
2
Introduction and Brief Background
Formation of ad hoc preservation group
In2014,theWashingtonD.C.Counsel’sofficewaslookingforawaytopreservethesourcematerialsfortheCodeoftheDistrictofColumbia.TheyreachedouttotheGeorgetownUniversityLawLibrarytoinquireaboutbecomingamemberoftheChesapeakeDigitalPreservationGroup‐madeupofGeorgetownLawLibrary,thestatelibrariesofMarylandandVirginia,andHarvardLawLibrary‐aprogramcreatedtopreserveborndigitallegalmaterials.TheDCcodehadbeenconvertedtoXMLdocuments,andbecauseofthedynamicnatureofthatsystem,itwasdeterminedthattheChesapeakerepositorywasprobablynotthebestsolution.
WiththeparticipationoftheAALLGovernmentRelationsOffice,thisconversationtransformedintoasmallgroupofpeoplefromotherstateswhoweregrapplingwiththesameissueofhowbesttopreserveofficialelectroniclegalmaterials‐oneoftheprimaryrequirementsoftheUniformElectronicLegalMaterialAct(UELMA).
TheUELMAPreservationGroupagreedthatthemainpurposeofthegroupshouldbetoinvestigatepreservationstrategies,andprovidetoolsandassistancetothosestatesthathaveadoptedorplantoadoptUELMA.Itwasdeterminedthatdeliverablesforthegroupmightbe:
Guidanceonwhatdigitalpreservationentails,andestablishingguidelinesandbestpractices Tryingtoeducatepeopleoncostsrelatedtolevelsofpreservation Examplesoftechnicalpapersanddocumentsthathavebeendeveloped,documentationandsample
languagetohelpwithadvocacy Atoolkitandcasestudies
Survey Thegroupdeterminedthattherewasnotenoughinformationavailableaboutthestatusofstateelectroniclegalmaterials,anddecidedthatasurveycouldprovideusefuldata.Afterlocatingcontactsineachstate,thesurveywaslaunchedin2015,withthegoalofdeterminingthestateofpreservationactivitiesforelectroniclegalmaterialsingeneral,andtodeterminewhat(ifany)opentoolsmightbeusefulforthecommunityasawhole.
ThecompiledresultsofthesurveycanbeviewedinAppendixIofthispaper,andingeneralrevealedthatwhilelegalmaterialsarebeingcreateddigitally(borndigital)aswellasbeingdigitizedfrompaper‐basedmaterials,manyarenotyetconsideredofficial.Inaddition,thesurveysuggestedthatthereisnotastrongdesireforeitheraconsortialsolution,oranopensourcetool.Consequently,thegroupdecidedthatthebestdeliverableatthistimewouldbeaguidancedocumentintheformofawhitepaper‐thiswhitepaper.
3
UELMA and Electronic Legal Material ForthosewhoarenewtoUELMA,thefollowingsectionwillgivebasicinformationabouttheact.
TheAmericanAssociationofLawLibraries(AALL)maintainsawebpageofUELMAresources(https://www.aallnet.org/advocacy/government‐relations/state‐issues/uelma‐resources/),anditdescribesUELMAinthisway:
“TheUniformElectronicLegalMaterialAct(UELMA)isauniformlawthataddressesmanyoftheconcernsposedbythepublicationofstateprimarylegalmaterialonline.UELMAprovidesatechnology‐neutral,outcomes‐basedapproachtoensuringthatonlinestatelegalmaterialdeemedofficialwillbepreservedandwillbepermanentlyavailabletothepublicinunalteredform.”
TextoftheActRelatingtoPreservation
SECTION7.PRESERVATIONANDSECURITYOFLEGALMATERIALINOFFICIALELECTRONICRECORD.
(a)AnofficialpublisheroflegalmaterialinanelectronicrecordthatisorwasdesignatedasofficialunderSection4shallprovideforthepreservationandsecurityoftherecordinanelectronicformoraformthatisnotelectronic.
(b)Iflegalmaterialispreservedundersubsection(a)inanelectronicrecord,theofficialpublishershall:
(1)ensuretheintegrityoftherecord;
(2)provideforbackupanddisasterrecoveryoftherecord;and
(3)ensurethecontinuingusabilityofthematerial.
Intermsofwhat“electronic”and“legalmaterial”meansinthecontextofUELMA,theActprovidesthefollowingguidance:
(1)“Electronic”meansrelatingtotechnologyhavingelectrical,digital,magnetic,wireless,optical,electromagnetic,orsimilarcapabilities.
(2)“Legalmaterial”means,whetherornotineffect:
(A)stateconstitution
(B)sessionlaws
(C)statecode
(D)astateagencyrulethathasorhadtheeffectoflaw
(E)categoriesofstateadministrativeagencydecisions
(F)reporteddecisionsofstatecourts
(G)statecourtrules
(H)anyothercategoryoflegalmaterialtobeincludedbyindividualstates
Itisastrengthoftheactthatitdoesnotprescribeatechnologicalstrategyforelectronicdocuments,therebyallowingforafullrangeofsolutionstodealwithafullrangeofdigitalformattypes.Thatflexibilityisalsoachallengewhentryingtodevelopastandardsetofsolutionsforthewidestpossiblesetofusers.ThestrategiesusedbythestatesthathavethusfarenactedUELMAhowever,fallintoonlyafewcategories,andthispaperprovidescasestudiesfromsomeofthosestatesthatwillillustratetheirstrategies.
4
Digital Preservation Therearethosewhointerpretdigitalpreservationtobetheactofprotectingpaper‐basedmaterialsthroughdigitization,butthereisincreasingrecognitionthatadigitalobject‐regardlessofwhetheritwasborndigitalorcreatedthroughadigitizationprocess‐isanobjectseparatefromanyanalogform,andassuchhasaseparatesetofcapabilities,aseparatesetofpreservationchallenges,andanequalneedtobepreserved.
TheNationalDigitalStewardshipAlliance(NDSA)definesdigitalpreservationas“Theseriesofmanagedactivities,policies,strategiesandactionstoensuretheaccuraterenderingofdigitalcontentforaslongasnecessary,regardlessofthechallengesofmediafailureandtechnologicalchange.”
OAIS (Open Archival Information System) Functional Model ‐ ISO 14721 ManypreservationsystemsrefertotheOAISFunctionalModeltoframetherangeofcapabilitiesthattheyoffer,soitisusefultogiveabriefexplanationaboutwhatitis.OAISisatheoreticalmodelratherthananactualsystemstructureandassuch,itdescribesthebasicfunctionsthatacompliantsystemmustperform.GraphicssuchasthisonefromWikipediaareoftenusedtorepresentthemodel:
(https://en.wikipedia.org/wiki/Open_Archival_Information_System)
PreservationsystemdescriptionswilloftenrefertoSIPs,AIPs,andDIPsandtheseacronymscomefromtheOAISmodel.Theyreferto:
SIP‐SubmissionInformationPackage;theinformationcomingintothesystem AIP‐ArchivalInformationPackage;thearchivalobjectsanddatacreatedandpackagedfromtheSIP DIP‐DisseminationInformationPackages);theobjectanddatathatiscreatedfromtheAIPandmade
availablethroughanaccesssystem.
Trusted Digital Repositories Ina2002report,OCLC(agloballibrarycooperative)definesaTrustedDigitalRepository(TDR)as“onewhosemissionistoprovidereliable,long‐termaccesstomanageddigitalresourcestoitsdesignatedcommunity,nowandinthefuture.”ArelatedstandardthatappliestodigitalpreservationistheTrustworthyRepositoriesAudit&Certification(TRAC)checklist(ISO16363).ThepurposeofthischecklististodefinethecriteriaforcertificationofarepositorysystemasaTDR.
5
Themetricsofthechecklistaresplitintothreetopicalareas:
OrganizationalInfrastructure‐therepository'sadministrative,staffing,financial,andlegalfunctions DigitalObjectManagement‐thehandlingofdigitalobjectsfromingesttoaccess Technology,TechnicalInfrastructure,andSecurity‐thetechnologyusedtohandleingestedobjects
TheOCLCreportfurthersummarizestheresponsibilitiesofaTDR.ItstatesthataTDRmust:
Acceptresponsibilityforthelong‐termmaintenanceofdigitalresourcesonbehalfofitsdepositorsandforthebenefitofcurrentandfutureusers
Haveanorganizationalsystemthatsupportsnotonlylong‐termviabilityoftherepository,butalsothedigitalinformationforwhichithasresponsibility
Demonstratefiscalresponsibilityandsustainability Designitssystem(s)inaccordancewithcommonlyacceptedconventionsandstandardstoensurethe
ongoingmanagement,access,andsecurityofmaterialsdepositedwithinit Establishmethodologiesforsystemevaluationthatmeetcommunityexpectationsoftrustworthiness Bedependedupontocarryoutitslong‐termresponsibilitiestodepositorsandusersopenlyandexplicitly Havepolicies,practices,andperformancethatcanbeauditedandmeasured
WhiletheprocessofbecomingaTDRisquiterigorous,thecertificationchecklistisausefultooltohelpaninstitutionmoveintherightdirection‐evenifneveractuallybecomingcertified.
Levels of Preservation Digitalpreservationisdistinctlydifferentfrompreservationofanalogobjects.Whilepreservationofpaper‐basedmaterialsgenerallyrequiresestablishingastableenvironmentandthenminimizinginteractionwiththematerials,digitalpreservationrequirescyclicalandrelativelyfrequentinteractionwithobjectsbeingpreserved‐toperformfunctionslikefixitychecking,refreshing,andformatmigrationwhenneeded.ThisrealityalongwithstandardssuchastheTDRcertificationchecklistcanoftencreatetheimpressionthatdigitalpreservationisanunreachablegoalformanyinstitutions.
TheInfrastructureWorkingGroupoftheNDSAcreatedaset of tiered benchmarksfordigitalpreservationactivities(seebelow),andwhiletheselevelsprovideguidanceforinitiatingapreservationstrategy,theimplicitgoalistocontinuetomoveupthetiersasfarasispracticable.ItisthejudgmentofthepreservationgroupthatLevel3istheminimallevelforcompliancewiththeact.
6
Matrix Categories
StorageandGeographicLocation
Thiscategoryaddressestheissueofhowmanycopiesofafileshouldbekept,withtheexpectationthatifoneshouldbethreatened,anothercouldbecopiedtoreplacetheloss.Includedinthisconsiderationisthephysicallocationofthedigitalfile,becausedifferentgeographicallocationswillhavedifferentthreats.Thefurtherthatcopiesarefromeachother,thelessriskthatasingleeventwilldestroyallcopies.TheLOCKSS Program(LotsOfCopiesKeepStuffSafe)basedatStanfordUniversityLibraries,isanexampleofapreservationstrategythatisbasedondistributedgeographiclocations.
FixityandDataIntegrity
ThePREMISDataDictionarydefinesfixitythisway‐“"informationusedtoverifywhetheranobjecthasbeenalteredinanundocumentedorunauthorizedway."Thisisprimarilydonewiththeuseofchecksums,whichactasa“fingerprint”ofadocument,andwhentheyarecalculatedperiodicallyandthencomparedwithearlierchecksums,itisapparentwhensomethinginthedocumenthaschanged,eitherthroughhumanintervention,orbecauseofspontaneousbitchanges(“bitrot”).Muchmoreinformationaboutfixitycanbefoundathttp://www.digitalpreservation.gov/documents/NDSA‐Fixity‐
7
Guidance‐Report‐final100214.pdf.Thiscategoryalsorelatestoprocessesusedtoextractdatafromobsoletemedia,aswellasensuringavirus‐freeenvironment.
InformationSecurity
Thiscategoryisforpoliciesandpracticesrelatingtowhohasaccesstofilesandwhattheirfile‐levelpermissionsare,aswellasdocumentationoffileaccess‐suchaswithatransactionlogthatrecordsaction,user,time,etc.TheDigitalRepositoryAuditMethodBasedonRiskAssessment(DRAMBORA),developedbytheDigitalCurationCentre,isamethodologytosupportthistypeofassessment,aswellasmanyothertypes.
Metadata
Thetypeofmetadatathatthiscategoryisconcernedwithisnotonlydescriptivemetadata,whichmostarefamiliarwith,butalsotechnicalmetadatasuchascaptureequipment,filemeasurements,suchasfilesize,resolution,pixeldimensions,forvisualfiles,ortimeforanaudiovisualfile.Ausefulpaperondifferenttypesofmetadatafordigitalpreservationpurposescanbefoundathttps://www.loc.gov/standards/premis/FE_Dappert_Enders_MetadataStds_isqv22no2.pdf
FileFormats
Somefileformatsareappropriateforlong‐termpreservationbecausetheyhavequalitiessuchasbeinguncompressedorlosslessintheircompression,theyhaveopenandavailablespecifications,theyarewidelyadopted,etc.ToviewwhatfileformatstheLibraryofCongressacceptsaspreservation‐worthy,andwhy,seehttps://www.loc.gov/preservation/digital/formats/intro/intro.shtml
8
Document Strategies Thetwomostcommonformatsforpreservation‐worthyelectroniclegalmaterialsarePDFandXML
PDF Strategies ThePortableDocumentFormat(PDF)isaverycommonspecificationusedforpresentingdocumentsonline,andisincreasinglyusedasanarchivalmasterformat(particularlyPDF/A).PDFwasdevelopedbyAdobeSystemsintheearly1990’sandthespecificationwasmadeavailableatnocostin1993,eventhoughitwasstillaproprietaryformat.In2008,thespecificationwasofficiallyreleasedasafullyopenandnon‐proprietarystandard,leadingthewayforitsuseasanarchivalmasterformat(althoughtherearealliedtechnologiessuchasPDFFormsthatarestillproprietary).SeveralstatesthathaveapprovedUELMAareusingPDFsastheirprimaryformatforofficialelectronicpublications.
PDF/Aisamulti‐partISOstandardforlong‐termarchivingformatforelectronicdocuments,basedonthePDFspecification.BecauseaPDF/Adocumentcontainseverythingitneedstopresentthedocument,includingembeddedfonts,itisintendedtobestableovertime,andcanbeusedonanyplatform.Inadditiontotheoriginalrelease(Part1),twoadditionalpartshavebeenmadeavailable,onein2011andanotherin2012.InPDF/AinaNutshell2.0,thePDFAssociationstates:
“Putinthesimplestpossibleterms,PDF/AisaPDFwhichforbidscertainfunctionswhichcouldhinderlong‐termarchiving.PDF/Aalsodemandsthatthefilemeetcertainrequirementswhichguaranteereliablereproduction.
Forexample,filesmustnotbeencryptedwithapassword,asallcontentmustalwaysbefullyavailable.Embeddedvideoandaudiodataarealsoprohibited:PDF/Aconsciouslyavoidsanythingthatrequiresexternalsoftwarefordisplayorplayback.JavaScriptandcertainactionsarealsoforbidden,asexecutingthemcouldpotentiallyalterthePDF.
PDF/Aalsoplaceshigherdemandsontheinformationitcontains.Allrequiredfonts(oratleastallglyphsforthespecificcharactersused)mustbeembeddedwithinthePDF.Toensureauniformcolourappearanceonavarietyofplatformsanddevices,colourinformationmustbegiveninaplatform‐independentformatusingICCcolourprofiles.ThesoftwaremustalsousetheXMPformatformetadata(whichisusedtostorethedataidentifyingthefileasaPDF/A,forexample).
PDF/Aalsosetstechnicallimits:forexample,thepagesizeislimitedtoanedgelengthofeither5.08metres(PDF/A‐1)orupto381kilometres(PDF/A‐2andPDF/A‐3).”
TherearemanytoolsavailabletocreatePDFandPDF/Adocuments,perhapsthemostobviousbeingthosebyAdobe,includingAdobeAcrobat.
XML Strategies ExtensibleMarkupLanguage(XML)isalanguageformarkingtextsimilarinsomewaystoHypertextMarkupLanguage(HTML),thelanguageofwebpages.ThedifferenceisthatHTMLprimarilydefineshowtextwillappearonawebpage,whereasXMLisdesignedtohelpdefinewhatdataactuallyis.AnotherdifferenceisthatHTMLfieldsarepredefinedandeveryoneusesthesametags,whereasXMLisdefineduniquelybythecommunitythatisutilizingitforadistinctpurpose.ThismakesXMLusefulforsharingdata.
AnexampleofaninstanceofXMLwithinthelegalrealmistheGlobalJusticeXMLDataModel(GJXDM).ItisdefinedbytheDepartmentofJusticeas“anXMLstandarddesignedspecificallyforcriminaljusticeinformationexchanges,providinglawenforcement,publicsafetyagencies,prosecutors,publicdefenders,andthejudicialbranchwithatooltoeffectivelysharedataandinformationinatimelymanner.”
9
AnotherexampleisLegalDocumentML,which“providesacommonlegaldocumentstandardforthespecificationofparliamentary,legislativeandjudicialdocuments,fortheirinterchangebetweeninstitutionsanywhereintheworldandforthecreationofacommondataandmetadatamodelthatallowsexperience,expertise,andtoolstobesharedandextendedbyallparticipatingpeers,courts,Parliaments,Assemblies,Congressesandadministrativebranchesofgovernments.”Thestandardaimstoprovideaformatforlong‐termstorageofandaccesstoparliamentary,legislativeandjudicialdocumentsthatallowssearch,interpretationandvisualizationofdocuments.”LegalDocumentMLispartoftheAkomaNtosospecification‐ExampleofXMLmarkupofalegislativedocument.
10
Metadata for Digital Preservation Bestpracticefordigitalpreservationincludesthecreationofdifferenttypesofmetadatathatwillhelptoensurelong‐termstewardshipofdigitalmaterials.Metadataforpreservationisinformationthatsupportsanddocumentstheprocessofdigitalpreservation,andinherentintheOAISmodel(seepg.4)istheconceptthatalloftheinformationpackages(SIP,AIP,andDIP)consistofthedigitalcontentthatis“packaged”alongwithdescriptiveinformation.
Atamoregranularlevel,thecontent(or“ContentInformation”inOAISlingo)mayincludethedigitalobjectalongwith“RepresentationInformation”(asalsodefinedintheOAISmodel).RepresentationInformationisdatathatisnecessarytointerpretorusethedigitalobject.Ifthedigitalobjectwereadataset,forinstance,theRepresentationInformationcouldpossiblybeinformationabouthowthedatawasgenerated,andwhatthestructureofthedatasetis,forexample.BoththedigitalobjectandtheRepresentationInformationmustbeequallypreservedasContentInformation.
The“PreservationDescriptionInformation”describeswhatisrequiredtopreservetheContentInformation,andmightincludeelementsofadministrative,structural,technical,orrightsmetadata.ThedefactointernationalstandardforpreservationmetadataisthePREservationMetadata:ImplementationStrategies(PREMIS)standard.
PREMIS FromthePREMISmaintenancepage,“ThePREMISDataDictionaryforPreservationMetadataistheinternationalstandardformetadatatosupportthepreservationofdigitalobjectsandensuretheirlong‐termusability.Developedbyaninternationalteamofexperts,PREMISisimplementedindigitalpreservationprojectsaroundtheworld,andsupportforPREMISisincorporatedintoanumberofcommercialandopen‐sourcedigitalpreservationtoolsandsystems.ThePREMISEditorialCommitteecoordinatesrevisionsandimplementationofthestandard,whichconsistsoftheDataDictionary,anXMLschema,andsupportingdocumentation.”
ThefullPREMISspecification,orDataDictionary,isquitelongandinvolved,butthereisamuchmoreaccessibledocumentcalledUnderstandingPREMIS,andhereispartoftheintroduction:
“…metadataiscategorizedaccordingtowhatitisintendedtoaccomplish:descriptivemetadatahelpsindiscoveryandidentificationofresources,administrativemetadatahelpsinmanagingandtrackingthem,andstructuralmetadataindicateshowcomplexdigitalobjectsareputtogethersothattheycanbeproperlyrendered.Similarly,preservationmetadatasupportsactivitiesintendedtoensurethelong‐termusabilityofadigitalresource…
Herearesomeexamplesofpreservationactivitiesandhowmetadatacansupportthem:
Aresourcemustbestoredsecurelysothatnobodycanmodifyitinadvertently(ormaliciously).Checksuminformationstoredasmetadatacanbeusedtotellifastoredfilehaschangedbetweentwopointsintime.
Filesmustbestoredonmediathatcanbereadbycurrentcomputers.Ifthemediaaredamagedorobsolete(likethe8"floppydisksusedinthe1970s)itcanbedifficultorimpossibletorecoverthedata.Metadatacansupportmediamanagementbyrecordingthetypeandageofstoragemediaandthedatesthatfileswerelastrefreshed.
Overlongperiodsoftimeevenpopularfileformatscanbecomeobsolete,meaningnocurrentapplicationscanrenderthem.Preservationmanagersmustemploypreservationstrategiestoensuretheresourcesremainusable.Thismightmeanmigratingoldformatstonewerequivalents,oremulatingtheoldrenderingenvironmentonnewerhardwareandsoftware.Bothmigrationandemulationstrategiesrequiremetadataabouttheoriginalfileformatsandthehardwareandsoftwareenvironmentssupportingthem.
11
Preservationstrategiesmayentailchangingoriginalresources(migration)orchanginghowtheyarerendered(emulation).Thiscanputtheauthenticityoftheresourceindoubt.Metadatacanhelpsupportauthenticitybydocumentingthedigitalprovenanceoftheresource‐‐itschainofcustodyandauthorizedchangehistory.”
ItfurtherdefinescategoriesthatareexcludedfromtheDataDictionary:
“TheDataDictionaryisnotintendedtodefineallpossiblepreservationmetadataelements,onlythosethatmostrepositorieswillneedtoknowmostofthetime.Severalcategoriesofmetadataareexcludedasoutofscope,including:
Format‐specificmetadata,i.e.,metadatathatpertainstoonlyonefileformatorclassofformatssuchasaudio,videoorvectorgraphics.
Implementation‐specificmetadataandbusinessrules,i.e.,metadatathatdescribesthepoliciesorpracticesofanindividualrepository,suchashowitprovidesaccesstomaterials.
Descriptivemetadata.Althoughresourcedescriptionisobviouslyrelevanttopreservation,manyindependentstandardscanbeusedforthispurpose,includingMARC,MODS,andDublinCore.
Detailedinformationaboutmediaorhardware.Again,althoughclearlyrelevanttopreservation,thismetadataislefttoothercommunitiestodefine.
Detailedinformationaboutagents(people,organizationsorsoftware)otherthanwhatisneededforidentification.
Extensiveinformationaboutrightsandpermissions;thefocusisonthosethataffectpreservationfunctions.”
METS OneofthestrategiesusedinPREMISistheconceptofextensionschemas.TheseareotherrelatedXMLschemasthatdefineelementsthatareusefulinPREMISbutdonotneedtoberedefinedaspartofPREMIS.OneoftheseextensionschemasistheMetadataEncodingandTransmissionStandard(METS).ThisisthemetadatastandardoftenusedforpackagingtheSIP,AIPorDIPforimportingorexporting.
ThemostcommonlyusedsectionsofaMETSrecordare:
Header–containsinformationabouttheMETSdocumentitselfsuchascreator. DescriptiveMetadata(dmdSec)‐usesextensionsalsotoutilizedescriptivemetadataschemessuchas
MARC,MODSandDublinCore.CanbeembeddedintheMETSrecordorpointtoexternalrecords. AdministrativeMetadata(amdSec)–informationabouthowfileswerecreated,rightsdata,
Masterfile/derivativeinformation,andmigrationdatacanberecordinginthissection.ThisalsocanberecordedwithintheMETSrecordorhavepointerstoexternalrecords.
Files(fileSec)–thisisalistingofallofthefilesthatcomprisethedigitalobject StructuralMap(structMap)–amandatoryMETSsectionthatoutlinesthehierarchicalstructureofthe
digitalobject.ItalsolinksthevariouselementswithinthestructMaptotheircorrespondingelementsinthefileSecordmdSec.Thisiscriticalfordigitalobjectsthataremadeupofmanyfilesbutrepresentasinglewhole,suchasapublicationthatmighthavehundredsoffilesthatneedtobearrangedinahierarchy(sections,chapters,etc.)withvariousdescriptivemetadatarecordsthatneedtoconnecttospecificplaceswithinthathierarchy.
12
Digital Storage
Cloud Storage Storageofdigitalmaterialshaschangeddramaticallysincethelate1990’swhentheprimaryformoflong‐termstoragewasonlocalmediasuchasCDsandDVDsandmagnetictape.Todayitiscommonpracticetouse“spinningdisk”forstorage,eitheronlocaldrives,institutionalnetworkeddrives,orincreasinglyoncloudservices.AccordingtotheDigitalPreservationHandbookbytheDigitalPreservationCoalition(DPC),“cloudcomputing”is“atermthatencompassesawiderangeofusecasesandimplementationmodels.Inessence,acomputing‘cloud’isalargesharedpoolofcomputingresourcesincludingdatastorage.Whensomeoneneedsadditionalcomputingpower,theyaresimplyabletocheckthisoutofthepoolwithoutmuch(oftenany)manualeffortonthepartoftheITteam,whichreducescostsandsignificantlyshortensthetimeneededtostartusingnewcomputingresources.Mostofthese‘clouds’arerunonthepublicInternetbywell‐knowncompanieslikeAmazonandGoogle.”
Hereissomebasicinformationaboutthesecloudsolutions:
Amazon Standard Simple Storage (S3) ‐ http://www.aws.amazon.com/s3
Amazonprovidestwomethodsofcostfortheircloudstorage:ondemandorreservepricing.Ifthesizeofannualstorageneededisknown,prepaymentisanoption,whichcansaveupto75%onthecost.
OnDemandCost:
Upto50TBStorage 51‐100TBStorage 500TB+Storage
0.023GB/month 0.022GB/month 0.021GB/month
Amazon Glacier - www.aws.amazon.com/glacier o StandardInfrequentAccess(I/A)o Fromthewebsite‐ “Customerscanstoredataforaslittleas$0.004pergigabytepermonth,a
significantsavingscomparedtoon‐premisessolutions.Tokeepcostslowyetsuitableforvaryingretrievalneeds,AmazonGlacierprovidesthreeoptionsforaccesstoarchives,fromafewminutestoseveralhours.”
o DeveloperResources‐AmazonprovidesanAPIthatallowdeveloperstowriteinterfacestocloudstoragesystemsorusethirdpartysolutionsthatprovideuserinterfaces.
Google Cloud ● https://cloud.google.com/storage/archival/● April2018cost‐Capacitypricingis1centperGB/monthfordataatrestforNearlineand0.7centsper
GB/monthfordataatrestforColdline.
Local digital storage Thesizeoflocalharddrivescontinuestogrow,withat60terabyte(TB)solidstatedrive(SSD)announcedin2016.TapestoragealsocontinuestobeimprovedwithIBMandSonyworkingontechnologythatcouldpotentiallystore330TBsinasinglecartridgethattakelessspacethanaharddrive.
Localstoragehasgreatermanagementoverheado Mustbebackedupo Mostharddriveshaveanaveragelifespanofabout5years
13
o Easytooverwritesoprotectionmustbeputinplace,suchasWriteoncereadmany(WORM) FromWikipedia:“Writeoncereadmany(WORM)describesadatastoragedeviceinwhich
information,oncewritten,cannotbemodified.Thiswriteprotectionaffordstheassurancethatthedatacannotbetamperedwithonceitiswrittentothedevice.Onordinary(non‐WORM)datastoragedevices,thenumberoftimesdatacanbemodifiedislimitedonlybythelifespanofthedevice,asmodificationinvolvesphysicalchangesthatmaycauseweartothedevice.The"readmany"aspectisunremarkable,asmodernstoragedevicespermitunlimitedreadingofdataoncewritten.”
ViewtheMinnesotaCaseStudytoseehowWORMstorageisbeingusedforUELMApreservation.
14
Case Studies California Description In2012,throughSenateBill1075,CaliforniaenactedtheUniformElectronicLegalMaterialAct.InaddingArticle4(commencingwithSection10290)toChapter1ofPart2toDivision2ofTitle2oftheGovernmentCode,theLegislatureidentifiedtheLegislativeCounselBureauastheofficialpublisherforelectroniclegalmaterial.“Electronic”and“legalmaterial”isspecificallydefinedinSection10291oftheGovernmentCode.LegalmaterialisdefinedastheCaliforniaConstitution,thestatutesoftheStateofCalifornia,andtheCaliforniaCodes(hereafterreferredtoas“SB1075LegalMaterial”).
Undertheact,anofficialpublisherthatpublisheslegalmaterialinanelectronicrecordandalsopublishesinarecordotherthananelectronicrecordmaydesignatetheelectronicrecordasofficialifthepublisherauthenticatestheelectronicrecord,preservestherecord,andensuresthattherecordisreasonablyavailableforusebythepubliconapermanentbasis(Secs.10293,10294,10296,and10297,Gov.C.).TheLegislativeCounselBureaupublisheslegalmaterialbothinanelectronicrecordandinarecordotherthananelectronicrecordandhasdesignatedtheelectronicrecordasofficial.TheserecordsgenerallyoriginateintheLegislativeCounselBureauaslegislativemeasuresthatareeventuallyenactedintolaw.TheLegislativeCounselBureaualsoincorporateschangesinlawmadethroughtheinitiativeprocessintoitsdatabase.Theelectroniclegalmaterialisthenpublishedatwww.leginfo.legislature.ca.govinbothPDFandHTML.
InNovember2016,Californiavoters,throughtheinitiativeprocess,approvedProposition54.Aspartofthatinitiative,theLegislatureisrequiredtocauseaudiovisualrecordingstobemadeofallpubliclegislativeproceedingsandtomakethoserecordingsavailabletothepublicthroughtheInternetwithin24hoursaftertheproceedingshaverecessedoradjournedfortheday.TheLegislatureisalsorequiredtomaintainanarchiveoftheaudiovisualrecordings,whicharetobeaccessibletothepublicthroughtheInternetanddownloadableforaperiodofnolessthan20years(para.(2),subd.(c),Sec.7,Art.IV,Cal.Const.).Inaddition,Proposition54requirestheLegislativeCounselBureautomaketheaudiovisualrecordingsavailabletothepublic,witheachrecordingtoremainaccessibletothepublicthroughtheInternetanddownloadableforaminimumperiodof20yearsfollowingthedateonwhichtherecordingwasmade,andtoalsothenbearchivedinasecureformat(para.(6),subd.(a),Sec.10248,Gov.C.).TheLegislativeCounselBureauwillmaketheaudiovisualrecordingsavailableatwww.leginfo.legislature.ca.govandwillpreservetheserecordings.
Implementation considerations
SB 1075 Legal Material
IndevelopingpreservationpracticestheLegislativeCounselBureaumustmeettherequirementthattherecordisreasonablyavailableforusebythepubliconapermanentbasisasspecifiedinSenateBill1075,California’senactmentoftheUniformElectronicLegalMaterialAct.Inthatregard,theLegislativeCounselBureauhadthreemainconsiderationsinimplementingasolutionforpreservationofSB1075LegalMaterial:
1. Wouldthesolutionmeetthestandardsforlong‐termpreservation;2. Wouldthesolutionbecost‐effective;and3. Wouldnon‐technicalstaffbeabletousethesolution.
TheLegislativeCounselBureaualsowantedtomeetLevel3oftheNationalDigitalStewardshipAlliancelevelsofdigitalpreservation.Toreachthatgoal,theLegislativeCounselBureauwouldhaveto:
Storeatleastonecopyinageographiclocationwithadifferentdisasterthreat; Engageinamonitoringprocessforourstoragesystemsandmediatodetermineobsolescence; Checkfixityofcontentatdeterminedintervals; Maintainlogsoffixityinformation; Havetheabilitytodetectcorruptdata;
15
Virus‐checkallcontent; Maintainlogsofwhoperformedwhichactionsonfiles; Storestandardtechnicalanddescriptivemetadata;and Monitorfile‐formatobsolescenceissues.
Tomeettheaforementionedrequirements,theLegislativeCounselBureauconsideredthreeoptions:
1. Cloudstorageusingbothpreservation‐specificcloudsolutionsandgeneralcloudsolutions;2. Standardinternalstoragesystemswithstandardbackupsalreadyinuse;and3. OffsiteopticalWORM(writeoncereadmany)technology.
Afterconsideringtheadvantagesanddisadvantagesofeachofthethreeoptions,theLegislativeCounselBureaudecidedtousestandardinternalstorage.Inaddition,theLegislativeCounselBureauhasundertakenapilotprojecttopreservethelegalmaterialinapreservation‐specificcloudstoragesolution.
Audiovisual
TheLegislativeCounselBureauisworkingwiththeCaliforniaSenateandAssemblytomeettherequirementsofProposition54.Underthisproposition,audiovisualrecordingsmustbemadereadilyavailabletothepublic,inthedownloadableformat,foraperiodofnolessthan20years(audiovisualarchive),andstoredinasecureformat.Inaddition,theLegislativeCounselBureauisevaluatinghowtopreservetheaudiovisualarchive.Thegoalsofpreservationencompassthefollowing:
1. Theuseofmethodsandtechnologiestoensuredigitalcontentisusablebythepublic.2. Theuseofmethodsandtechnologiesthatmaintainthedigitalcontentasdigital‐contentstandardschange.3. Theprovisionofaperpetuallyaccuraterenderingoftheaudiovisualrecordings.Inthatregard,the
audiovisualrecordingswouldberetainedintheoriginalfileformatcreatedbytheaudiovisualinfrastructure.TheSenatecurrentlyusesafileformatknownas“LXF”or“LEGODigitalDesignerModelFiles,”whiletheAssemblyusesafileformatknownas“TS”or“TransportStream.”Thesefilesaretheoriginalsourcefilesfromwhichanyfuturefileconversionwouldbederivedtomeetnewdigitalfileformatstandards.
4. Thestorageofonecopyoftheaudiovisualrecordingsfileinageographiclocationwithadifferentdisasterthreat.
5. Theprovisionofsecureaccesstotheaudiovisualrecordingsfiletoensurethattherecordingsarenotmodified.
Solution BusinessProcessAdjustments
SB1075LegalMaterial
TheLegislativeCounselBureaudevelopedastrategicplanfortheauthenticationandpreservationofSB1075LegalMaterial,whichincluded:
1. Identifyinglegalmaterialstobepreserved,consistentwiththeUniformLawCommission’sversionoftheUniformElectronicLegalMaterialAct.(California’senactmentinSB1075coveredfewermaterials.);
2. IdentifyingunitswithintheLegislativeCounselBureauthatareresponsibleformaintainingthelegalmaterials;
3. Formalizingproceduresforauthenticationofthelegalmaterial;4. Establishingguidelinesforcost‐effectivereviewofpreservationneedsofdifferentlegalmaterialsona
cyclicalbasistomaintaindatafidelityandintegrity;and5. Formalizingupdateproceduresforpreservationpurposes.
16
GiventherequirementsanddefinitionssetforthinSenateBill1075,theLegislativeCounselBureaufocusedonthelegalmaterialdevelopedduringthelawmakingprocessrelatedtotheworkoftheLegislativeCounselBureauthatismadepubliclyavailable:thecodes,statutes,andconstitution.(Additionalmaterialgeneratedduringthelegislativeprocesshasbeenidentifiedaslegalmaterialthatshouldbepreserved.Butthatmaterialisnotpartofthecurrentauthenticationandpreservationstrategy.)Steps1‐3abovehavebeencompletedfortheSB1075LegalMaterial.ThesystemsthatareusedtodraftandpublishtheSB1075LegalMaterialwereadjustedsothattheLegislativeCounselBureaudidnotneedtochangeitsbusinessprocess.Instead,softwarehandlestheauthenticationprocessandprovidesforstorageofSB1075LegalMaterial.Also,softwarewasdevelopedsothatstaffcouldwritetothepreservationsystematscheduledintervals.Tocompletesteps4and5,theLegislativeCounselBureaumustdevelopauditproceduresfortheSB1075LegalMaterialandformalizeprocedurestoupdatethatmaterialandthetechnologythatisusedtoallowaccesstothepreservedmaterial.
Audiovisual
TheLegislativeCounselBureauiscurrentlyevaluatingbusinessprocesschangesinordertopreserveaudiovisualrecordingsunderProposition54.Oneconsiderationistheestablishmentofaprocessbywhichcurrentdigitalfilestandardsareassessed,perhapsonabiannualbasisfollowingthelegislativecycle,sincetheCaliforniaStateLegislaturehasmanyotherprocessesthatrevolvearoundthiscycle.Ifdigitalfilestandardschange,theLegislativeCounselBureauwouldbegintheprocessofconvertingtheoriginalsourceLXForTSfilestothenewstandard,replacingtheout‐of‐datestandard.
Anotheraspectofdigitalpreservationistheaccuraterenderingofauthenticatedcontent.Sincetheseaudiovisualrecordingsareintendedtoshowthepublichowthelegislativeprocessproducedbillsthatmaybecomelaw,theLegislativeCounselBureaumustensuretherecordingsarepresentedtothepublicwithoutmodification.
ForthispurposewithrespecttoSB1075LegalMaterial,theLegislativeCounselBureauusesAdobedigitalsignaturestoensurethatthedocumentshavenotbeenmodified.Thereisnosimilarsolutionforaudiovisualrecordingscurrentlyavailable.Onesolutioncouldbewriteonce,readmanytechnology.WORMdatastoragetechnologyallowsinformationtobewrittentoadiscasingletimeandpreventsthedrivefromerasingoreditingthedata.Theimplementationofthistechnologyforsecurelystored,publiclyaccessibleaudiovisualrecordingswouldineffectmakethemauthentic.
IT Design/Components
SB1075LegalMaterial
TheLegislativeCounselBureauhasundertakenapilotprojectusingPreservica’scloud‐hosted,standards‐based(OAISISO14721)activepreservationsoftwareforpreservationofSB1075LegalMaterial.Thisweb‐baseddigitalpreservationapplicationprovidestheLegislativeCounselBureauwiththeabilitytostorefilesandperformpreservationtasks.Preservicaprovidessecureauthenticatedaccess,automaticallyclassifiesdocuments,andsetsaccesspermissionsduringingest.Preservicahasbuilt‐inworkflowsthatareusedbytheLegislativeCounselBureauforingestofdataandmetadatamanagement.UsingPreservica’sSubmissionInformationPacket(hereafterreferredtoas“SIP”)packagingdesktopclient,thePreservicaadministratoruploadscontentintoorganizedfilehierarchiesbasedonstatuteyearandCaliforniaCodesupdates,bothofwhichtakeplacetwiceayear.
ThedataisgeneratedbytheLegislativeCounselBureau’sLegalDivisionduringeachyearofthetwo‐yearlegislativesessionintheformofbillsthatareenrolledaspartofthenormallegislativebusinessprocessandsenttotheGovernorforaction.IfabilliseithersignedbytheGovernorortheGovernorletsitbecomelawunsigned,thesystemscreatethestatutesandauthenticatethedocumentsusingAdobecertificate‐baseddigitalauthentication.TheLegislativeCounselBureauPreservicaadministratorextractstheauthenticateddocumentandusesPreservica’sSIPclienttoloadintothecloudpreservationsite.Atthattime,Preservica’sSIPclientalsoaddsbasicdescriptivemetadata.ThatmetadatawasdevelopedasacollaborativeengagementbetweentheLegislativeCounselBureauandtheCaliforniaStateArchivesusingDublin‐CorestandardstomeettheneedsoftheLegislature.
17
Audiovisual
TheLegislativeCounselBureauisimplementinganEMCIsilonWORMstoragetechnologyforpreservationoftheaudiovisualrecordings.Isilonisascale‐out,network‐attachedstorageplatformofferedbyEMCCorporationforhigh‐volumestorage.ThissystemwillstoreboththeoriginalformatandamodifiedformatMP4fileforpublicaccessanddownloading.
Asalong‐termdatapreservationstrategy,theLegislativeCounselBureauwillstoretheaudiovisualdataatmorethanonesite.WithintheLegislativeCounselBureau’sprimarylocationinSacramento,anEMCIsilonstoragesystemwillbeimplementedconsistingofperformance‐optimizedstoragenodesandcapacitynodes.ThestrategytoprotectthedatafeaturesredundancyandwillplaceanotherEMCIsilonstoragesystematanestablishedoff‐sitelocation.Thevideodatawouldbereplicatedtotheoff‐sitelocationusinghigh‐bandwidth,secureandredundantpoint‐to‐pointwideareanetwork(WAN)connections.Ifanoutageoccurredattheprimarylocation,thevideodatacouldbeaccessedandrestoredfromtheoff‐sitelocation.
Licensingwillbepurchasedtoenablethewriteonce,readmanytechnologyavailablewithintheIsilonstoragearrayknownas“SmartLock.”TheIsilonSmartLocktechnologyprotectsthedataagainstaccidentalormaliciousdeletionsoralterations.ThistypeoftechnologyhelpsprotectdigitalfilesfrombeingmodifiedwhilethosefilesresidewithintheSmartLock‐enabledfiledirectory.Withthistechnology,anyvideomadeavailableontheInternetwouldbeprotectedfromalteration,ensuringthevideofile’sintegrity.Thus,thetechnologywouldsatisfydata‐authenticitycriteriawithindigitalpreservation,tomeettherequirementsofProposition54thattheaudiovisualrecordingsbeinasecureformat.
Costs SB1075LegalMaterial
TheLegislativeCounselBureaudecidedtousethecurrentbusinessandtechnicalprocessesforpreservationofSB1075LegalMaterial.Therefore,developmentcostswereabsorbedintothestandarddevelopmentbudget.
TheLegislativeCounselBureaudoesnotallowpublicaccesstothePreservicaarchive.Thus,theCloudEditionStartersubscription–upto250GBatacostof$4,000peryear–meetstheBureau’scurrentneeds.ThePreservicasystemwillallowtheLegislativeCounselBureautoscaleup,asstorageneedsincrease.
Audiovisual
TheLegislativeCounselBureauestimatestheaveragecostofthestoragesolutionforaudiovisualrecordings,whichisrequiredtostoretheoriginalfileformat(LXF),willbe$250,000peryear.ThisincludesthebackupWORMsolutionrecommendedforpreservingtheaudiovisualdata.
Our Current Assessment SB1075LegalMaterial
TheLegislativeCounselBureaufollowedastructuredsystemsdevelopmentlifecycleindesigningthearchivingprocess,inordertomeetthepreservationrequirementsoftheUniformElectronicLegalMaterialAct,asenactedinCalifornia.InformationtechnologystafffromtheLegislativeCounselBureaumetwithsubject‐matterexpertstounderstandthelegalmaterialthatrequiredarchivingforpreservationpurposes.Thisincludedunderstandingwhatmetadatatheownersofthelegalmaterialconsideredmeaningful.Questionswereaskedsuchas:Whatwasthebesttimingtocapturethelegalmaterial?Howwouldconsumersofthelegalmateriallikelyidentifythematerialinasearch?
ITstaffalsowroterequirements,includingusecases,forprocedurestoextractfiles,createmetadata,andstagethefilesinpreparationforextractiontoacloud‐basedarchivingplatform.
18
Aspreviouslystated,adeterminationwasmadetouseanindustrystandard(DublinCore)indeterminingthemetadatatobecapturedforthelegalmaterial.TheuseofthisstandardallowedeasyintegrationwithPreservica’singestworkflow.Metadataiseasilydiscoverableforarchivedfiles.ThisstandardallowedtheLegislativeCounselBureautoshareinformationwithStateArchives.
TheLegislativeCounselBureaustruggledwithhowtohandleversioningofthelegalmaterial.Notalllegalmaterialhadthesamerequirementsforversioning.TheITstaffdecidedtocaptureasnapshotofthelegalmaterialonacertaindateandtime,afterworkingwiththeownersofthelegalmaterialtodeterminethatdateandtime.TheLegislativeCounselBureauingestsandidentifiesasnapshotofthelegalmaterialbasedonthedatethematerialwasextractedfromthedocumentrepository.
TheLegislativeCounselBureauisalsocurrentlymanuallyinitiatingthearchivingprocess.WeneedtofindwaystoleveragetheworkflowavailableinPreservicatoautomatetheextractionandingestprocess,therebytakingadvantageofthededicateddocumentrepositorythatutilizesdataservicestomakethelegalmaterialavailable.Thisprocesswouldallowforastandardinterfacetoallthecurrentandfuturedocumentsthatneedtobepreserved.Inturn,theprocesswillmakeextractionoflegalmaterialeasier.
AnotheradvantageofthecurrentarchivingsolutionisthatmetadataisseamlesslyintegratedwiththelegalmaterialasitisingestedintoPreservica.Thismakesiteasytodiscovermetadatawhenviewingthelegalmaterialinthepreservationtool.
Thereareareasthatneedfurtherattention,includinghowtosecurelysharenecessarydocumentsandmetadatawithStateArchives.ThoughtheLegislativeCounselBureauandStateArchivesusethesamearchivingtool,therearesomeredundanciesineffortwhenbothentitiespreservethesamedocuments.
TheLegislativeCounselBureauhasmadethearchivingeffortforpreservationundertheUniformElectronicLegalMaterialActanongoinginternalproject.Thatwillallowtheprojectteamtofindwaystoimprovetheprocessesandprocedures,particularlyifadditionaltypesoflegalmaterialareaddedtotheproject.Theprojectteammustcommunicatewiththeownersofthelegalmaterialandexternalcustomerstoensurethattheirrequirementscontinuetobeincorporatedintothearchivingsolution.
Audiovisual
TheLegislativeCounselBureauistooearlyintheimplementationoftherequirementsofProposition54regardingaudiovisualrecordingstoassesssolutions.
Notes Thisreportisforeducationalpurposes.Referencestoanyspecificcommercialproducts,process,service,manufacturer,company,ortrademarkdonotconstituteanendorsementorrecommendation.
19
Minnesota
DanielKruseSystemsAnalyst/ProgrammerJasonDuffingSystemsAnalyst/ProgrammerJasonJudtDataSystemsProjectManager
TheMinnesotaOfficeoftheRevisorofStatuteshasconstructedKEEPS;acustomsoftwaresolutiontosatisfytherequirementsforpreservationandsecuritydetailedintheUniformElectronicLegalMaterialAct(UELMA).TheKeepElectronicEdictsPreserved&Secure(KEEPS)systemisinthetestingphaseandisscheduledfordeploymentin2016Q4.UELMA
Background InMinnesota,UELMAwasenactedin2013asMinnesotaStatutechapter3E.UELMAestablishesanoutcomes‐based,technology‐neutralframeworkforprovidingonlinedigitallegalmaterialwiththesameleveloftrustworthinesstraditionallyprovidedbypaperpublications.TheActrequiresthatofficialelectroniclegalmaterialbe:(1)authenticatable;(2)preserved,eitherinelectronicorprintform;and(3)accessible.TheKEEPSsolutionwasspecificallydesignedtosatisfyrequirement(2).
TheUELMArequirementsforpreservationandsecurityareinsection3E.07.Section7statesthatifofficiallegalmaterialispreservedinanelectronicrecord,theofficialpublishershall:
(1) ensuretheintegrityoftherecord;(2) provideforbackupanddisasterrecoveryoftherecord;and(3) ensurethecontinuingusabilityofthematerial.
System Description TheKEEPSsystem'sprimarygoalistoensuretheintegrityofofficialelectronicrecords.Thesystemmakesbackupanddisasterrecoverypossibleinseveralways.Theuseofwriteoncereadmany(WORM)diskdrivesandoffsitetapebackupscreatedfromseparatedocumentrepositoriesensuresthecontinuingavailabilityandusabilityofthematerial.
Thesoftwaresystemwasbuiltin‐houseusingstaffprogrammersandexistingcommercialproducts(Figure1).Theseproductsare:avirtualmachine,writeoncereadmany(WORM)disk,arelationaldatabase,andatapebackupapplication.Additionally,acustomsoftwareapplicationwasdeployedtothevirtualmachine.
Figure1–SystemDiagram
20
KEEPSintegrateswiththelegislativepublishingsystem,abackenddatabase,andaprivateintranet.TheKEEPSserverhasexclusiveaccesstotheWORMDisk,andreadwriteaccesstothedatabase.Thewebhasreadonlyaccesstothedatabaseforthepurposeofmonitoringthesystem.
Writeoncereadmany(WORM)diskdrivesareanintegralpartofthissystem.WORMdisksareessentialtoensuringtheintegrityofthedataandarethefoundationaroundwhichthissystemwasdesigned.TheKEEPSsolutionwillleverageGreenTecWORMStorageServersasthehardwarebestsuitedforthissystem.Thesestoragedevicesenforcewrite‐oncecapabilitiesthroughhardwaremechanismsratherthansoftwarerunningonacomputer.
Development ThebasicrequirementsfortheKEEPSsystemcanbesummarizedas:
Preservenewlypublisheddocuments. Catchanyerrorsinourpubliclyavailablelegalmaterial. Runindependentlyofourothersoftwarewithoutuserintervention.
KEEPSSoftwarewaswrittenusingJavaSE8consistingofthreeprimarymodules(Figure2):anArchiverthatwritespublishedUELMA‐compliantdocumentstotheWORMDisk,aWORMContentsPopulatorthatrecordsWORMdiskcontentsinadatabasetablewhichisusedinthevalidationprocess,andaValidatorthatworkswiththedatabasetovalidatethepubliclyavailablelegalmaterialandpopulatestheValidationErrorstable.Eachofthemodulesrecordsitsactivityinadatabasetablethatcanthenbeseenonanintranetpage.ThesoftwareisbuiltanddeployedusingtheApacheAntandIvyprojects.ThemodulescanberunatpredeterminedintervalsorondemandandaresynchronizedbytheScheduleManager.TheWORMdiskdocumentsandpublicUELMAdocumentsarebacked‐upseparatelytotape.Thetapesarestoredoffsite.
Figure2–KEEPSArchitecture
21
Testing TestsoftheidentifiedscenariosthatconstitutedcorruptionorfailuresofthepublicUELMAdocumentssystemwereperformed.Testscovered:(1)anunauthorizeddocumentinsertedintotheproductiondatabase,(2)adocumentremovedfromtheproductiondatabase,(3)changesofanytypetoanexistingdocumentintheproductiondatabase,and(4)theinabilitytoarchiveadocumenttotheWORMdisk.Inallcases,ourvalidationsystemcorrectlyidentifiedtheissueandreportedit.Deploymentiseffortlessandhasbeenrepeatedmanytimestoensuretherobustnessofthesoftwareandtheeaseofinstallation.
Loadtestingwasconductedonavirtualserverwith4GBofmemoryrunningWindowsServer2012R2.Forarchivetesting,51,463MinnesotaStatutesections,intheformofPDF,totaling6.3GBswerepublished.Totalarchivaltimewas37minutes.Validationtestingoccursdailyon606,105PDFdocumentstotaling65GB.Dailyvalidationcompletesinunder40minutes.Allloadtestsareconsideredsuccessfulinourenvironment.
ThesystemisstableandhasbeenrunningwithoutprogrammerinvolvementsinceMarch23,2016.Ithandleserrorsgracefullyandcontinuesprocessing,providingdetailedlogsthatcanbeusedtotroubleshootissues.
Schedule
Prototype,2014‐2015Thefirstprototypewasdevelopedasaproofofconcept,byStephenSegalthePrincipalatSystemSpecialties,Minneapolis,MN.WritteninPHPitprovidedthebasisforgoodestimatesofthetimerequiredtovalidateourentirerepositoryoflegalmaterialonanightlybasis.
Build&Test,January–March2016Thesystemwasfunctionallytestedasitwasdeveloped,andreleasedtolong‐termtesting.
Testing,April–September2016Thesystemisstableandrunninginasimulatedproductionenvironment.
FinalDeployment,October2016○ Productionenvironmentwillbecompleted.○ GreenTecWORMStorageServerswillbepurchased.○ BulkArchiverwillwriteallexistingUELMAdocumentstotheWORMDisk.○ ArchiverwillwritenewUELMAdocumentstoWORMDisk.○ Validatorwillrundaily.
22
Washington, D.C.
DavidGreisen‐OpenLawLibrary
VincentChuang‐OpenLawLibrary
TheOpenLawPlatformisasoftwaresystemcreatedforthepurposeofpublishinglaws,codes,legalinterpretations,andanyotherlegaldocumentproducedbyagovernment.Aspartoftakingadigital‐firstapproachtolegalpublishing,theOpenLawPlatformincorporatesUELMAcomplianceasacorecomponentoftheplatform.
TheCounciloftheDistrictofColumbiaisusingtheOpenLawPlatformtopublishitslawsandcode(https://code.dccouncil.us)andprovidesacasestudyforreplicatingkeyfeaturesandprocessesatotherjurisdictions.XMLrepresentationsoftheDistrict’slawsandcodescanbefoundathttps://github.com/dccouncil/dc‐law‐xml.
ThePlatform’sversionofUELMAcomplianceismodeledonbrick‐and‐mortarlibraries.Lessonsaboutreadabilityovertime,informationredundancy,versionhistory,andauthenticationhavebeenlearnedovercenturiesinthephysicalworld.AnditisusefultoapplymanyofthoseideaswhenconsideringdigitalpreservationandauthenticationunderUELMA.
Terminology TheCounciloftheDistrictofColumbiaisaPublishingEntity.AsaPublishingEntity,theCouncilisresponsibleforpublishingandauthenticatingtheLibraryofofficiallegalmaterialsrelevanttoitself.TheCouncil’sLibrarycontainsvariousDocuments,includingrapidlychangingdocuments,liketheentireDistrictofColumbiaCode,andstaticdocumentslikeindividuallaws.AnotherPublishingEntitycouldbetheExecutiveOfficeoftheMayor,anditsLibrarycouldincludeDocumentssuchastheDCMunicipalRegulationsandtheDCRegister.
AnimportantdifferencebetweenaLibraryintheOpenLawPlatformandabrick‐and‐mortarlibraryarisesinthecontextoftime.Thecontentsofaphysicallibrarymightchangeovertime,butyoucanonlyevervisitthelibraryasitistoday.Thatistosay,ifHarvardLawLibrarythrowsawayitscopyofAWrinkleinTime,thelibraryisstilltheHarvardLawLibrary,butyoucannevertravelbackintimetoreadAWrinkleinTimethere.AnOpenLawPlatformLibraryconsistsofasnapshotofeveryversionofthelibraryasithaseverexisted.Forinstance,onJanuary1,2018,theLibraryDC‐Law‐XMLmayhavecontainedonethousandlaws.WewouldrefertothatLibraryasDC‐Law‐XMLasofJanuary1,2018.OnJanuary2,2018,theCouncilmaypassanewlawandaddittoDC‐Law‐XML.Unlikeatraditionallibrary,youcanvisittheLibraryasitexistedonJanuary1orasitexistedonJanuary2.
AConsumer,likeacitizenoftheDistrict,canviewDocumentswithinaLibraryordownloadtheentirelibrary.AndHostingEntities,suchaslawlibraries,candownloadandhostacopyofanentireofficialLibrary.Forinstance,iftheHarvardLawLibrarywishedtohostanauthenticatablecopyoftheCouncil’sLibraryontheHarvardLawLibrarywebsite,itcoulddoso,justasitcouldpurchaseandhostanofficialpapercopyoftheDistrictofColumbiaCode.
Considerations InadditiontoUELMAitself,theOpenLawPlatformwasdesignedwithseveralrelatedandoverlappingconsiderations.
Time
Legaldocumentshavealonghistory,andthathistoryisitselfsubstantivelyvaluable.Asaresult,theOpenLawPlatformiscreatedwiththeintentionthateveryversionofthecontentitpublishesbeaccessibleandauthenticatablelongintothefuture.Andbecauselegalhistoryislong,thismeanscapturingandmaintaininglargevolumesofdocuments.TheCouncil’sLibraryisonlytwoyearsold, yetcontainsmorethanthirtythousand pagesoflawsandcode,andisgrowingbyoverfivethousandpagesannually.Librariesmustbemanageable,usableandresponsiveevenwhilecontainingordersofmagnitudemoredatathantraditionallibraries.
23
Authentication
TheauthenticationschememustalsoberobustagainstawiderangeoffactorsfromtheperspectiveofPublishingEntities,HostingEntities,andConsumers.
PublishingEntitiesaregovernments,andgovernmentsvarywidelyinthenumberofpersonnel,institutionalcapacity,andorganizationalstructure.Theauthenticationprocessmustbeusableinthesevaryingenvironments.Itmustbepossible,foranygovernment,toclearly,easily,andsecurelyconvey(1)whenadocumentwaspublished,(2)whopublishedit,and(3)theauthorityofthatpersontodoso.Allthreequestionscanbeansweredwithanappropriatelydesignedcryptographicsigningframework.
Inordertoberobustovertime,theframeworkmustberesilienttothelossorcompromiseofprivatecryptographickeys.Thesystemmustalsoprovideforrestorationintheeventofagovernment‐scalecatastrophe:theremustbeamechanismforrestoringaLibraryafterallencryptionkeyshavebeenlost.Andthesystemmustoperateongovernmenttimescales.Becausepublisheddocumentsareintendedtobeusedoverthecourseofdecades,accessibility(bywayofreadabilityorcryptographicscheme)mustkeeppacewithchangingtechnology.ALibrarymustbeaccessibleandauthenticatablelongafterthePublishingEntityhasabandoneditandmovedontoothertechnologies,justasanofficialpapercopyisatalawlibraryevenifthegovernmentnolongerhasthatparticularversion.
AHostingProvidershouldbeabletohostauthenticatableversionsofaLibraryforitspatrons.FortheConsumer,aLibrarymustfunctionacrosseveryusecase.Insituationsinwhichthedeliverynetworkiscompromised(suchashackerstakingoverthePublishingEntity’swebserver),aConsumermuststillhaveconfidencethattheLibrarybeingviewedisauthentic.Aswithphysicaltext,aLibraryshouldbeaccessibleandauthenticatableevenwithoutaninternetconnection.BecauseLibrarieshaveaversioncomponent,aConsumershouldbeabletoascertaininformationregardingbothauthenticityandversioninginformation,akintocheckingpublicationinformationinsideabook.
Redundancy
Thesystemmustalsobedistributed.JustasHarvardLawLibraryandUSCLawLibrarymaybothcarryacopyofAWrinkleinTime,aConsumershouldbeabletoaccessaLibraryfromaHostingEntityandbeabletoconfirmthattheLibraryisthesameasoneacquiredfromtheoriginalPublishingEntity.EvenifaConsumercanneveraccesstheoriginalLibraryfromtheoriginalPublishingEntity,thehostedLibraryshouldbeauthenticatablewithoutreferencetotheoriginal.
The Open Law Platform Solution Withthesevariousconsiderationsinmind,theOpenLawPlatformutilizesacombinationoftechnologies,includingXML,Git,andstrongencryption,toimplementasetofauthenticationtechniques.
XML
TheOpenLawPlatformstoresalmostalldocumentsasplaintextXML.Byusingplaintextinsteadofabinaryformat(e.g.,PDF),aLibraryanditsDocumentsarevirtuallyguaranteedtobereadablefordecadestocomewithoutrelyingonlegacysoftware.Plaintextalsorequiresconsiderablylessstoragespacethanbinaryformats.FortheCouncil,30,000pagesofXMLcanbestoredin100megabytes,whileonly10,000pagesofPDFsrequirefiftytimesthespacewhencompressedand500timesthespacewhenuncompressed.ThisdifferencemeansitisfeasibletostoreeveryversionofaplaintextdocumentinlessspacethanasingleversioninPDF.
XMLalsohastheadvantageofbeingabletostorethestructureofadocument,insteadofjustpresentationinformation(i.e.,howsomethinglooksonascreen).Thismeansdocumentscanbeconvertedintoanydisplayformatinthefutureandnotbetiedtoanyspecificsoftware.Together,thesebenefitsofXMLmakeitpossibletosatisfytheneedforusabilityovertime,abilitytostorelargeamountsofhistoricalinformation,andspeedofuse.
24
AcommonconcernwithXML‐basedsolutionsisthatXMLcanappearcomplicatedandrequiresadifferentsetoftoolsthanmostlawyersareusedtousing.ThishasresultedinveryfewUELMA‐compliantXMLimplementations.
TheOpenLawPlatformsolvesthisprobleminseveralways.First,theplatformfocusesonmakingtheXMLverycleanandsimple,using,wheneverpossible,ajurisdiction’sterminologytodescribeadocumentanditscontents(e.g.,Title,Chapter,Subchapter,andSection).Theplatformalsostoresmetadatalogicallywithinthedocument,againusingthesameterminologyasthejurisdiction.
Goodtooling(i.e.,softwareforviewingandeditingtheXML)alsogoesalongwaytomakingXMLmoredigestible.TheplatformprovidesamixofcustomXMLschemasandsoftwaretoensureXMLaccuracy,aswellasautomaticerrordetection,andothersmarteditingcapabilities.Byfocusingonuserexperience,lawyersfamiliarwiththeDistrict’slawsandcodewereabletonavigateandunderstandXMLrepresentationsoflawandcodewithnotraining.
ConvertingdocumentsintoXMLisitselfaprocess.Butagain,goodtoolingcanmaketheprocessfeelseamless.TheOpenLawPlatformincludesOpenLawDraft,aMicrosoftWordpluginthathelpsdraftersconformtotheirjurisdiction’sstyleguides.Oncethedocumentconformstothestyleguide,DraftcanturnthedocumentintocorrectXMLwithoutuserinput.
AnXML‐basedsolutionhasmanybenefitsinherentinitsformat,withthebiggestbarriersbeingusabilityandconversionofexistingdocumentsintotheformat.Afocusonuserexperienceandgoodtoolingcanovercomethesehighhurdles.SuccessonthisfrontrevealsthedownstreambenefitsofXMLthatultimatelyoutweightheinitialcosts.
Git
TheOpenLawPlatformstoresXML(andanystaticPDFs)usingtheopensourceGitdistributedversioncontrolsystem(https://git‐scm.com/).Insimplestterms,Gitisapieceofsoftwarethatkeepstrackofchangestooneormorefiles(eachgroupofoneormorefilescollectivelyreferredtoasa“repository”),recordsthedifferencesbetweennewandoldversionsofoneormorefiles,andmaintainsahistoryofthedifferences.Itdoesso,inpart,byprovidingtheabilitytosigneachversionwithauniquecryptographickey(https://git‐scm.com/book/id/v2/Git‐Tools‐Signing‐Your‐Work).Thismakesitpossibletopreservedifferentversionsofdocumentsastheychangeandcreatesanimmutablechainofauthenticatableversionsbacktotheoriginal.
GitmakesiteasytocopyanentireLibraryfromoneplacetoanotherandthenkeepthecopyup‐to‐datewiththeoriginalbyjustsyncingchanges.BecauseeverycopyofaLibraryhasallthehistoricalinformationandauthenticationinformationoftheoriginal,itisinherentlyfraudresistant.IntheeventamaliciousactorattemptedtomodifythehistoryoftheoriginalLibrary,thenexttimeacopyattemptedtosyncwiththenow‐fraudulently‐modified“original”,thecopywoulddetectthemodificationofthehistoryandrejectthefraudulenthistory.
Gitisfree,opensource,andavailableonvirtuallyeveryplatform.TherearealsomanycloudservicesthatprovideGitaccess.Becauseofthiswideavailability,aLibrarythatisstoredasaGitrepositorycanhaveallofitshistoricalinformationhostedonavarietyofphysicalmachineslocatedacrossalargegeographicregion.Andeverycopyiseasilyauthenticated.
SigningaLibrary
WithXMLandGitastheunderlyingtechnologies,theOpenLawPlatformimplementsspecificprocessestoachievetheneededauthenticationoutcomes.
Atthegovernmentlevel,eachemployeewhohasauthoritytopublishlegaldocumentsreceivesasmartcard(e.g.,https://www.yubico.com/products/yubikey‐hardware/).Asmallgroupofemployees(minimumofthree,preferablyfive)orothertrustedindividualscreatesanAttestingGroup.EachmemberoftheAttestingGroup(anAttestor)hasasmartcardthattheyusetosignAttestationsofAuthority.
25
OnceathresholdofAttestors(usually50%)haveattestedthataparticularpersonhasauthoritytopublishofficialdocuments,thatpersonisaPublisher(aspartofaPublishingEntity)andcansignnewreleasesofaLibrary.IfaPublisherleavestheorganizationorlosestheirkey,theAttestorsattestthattheoldkeynolongerhasauthoritytopublish.IfanAttestorleavestheorganizationorlosesakey,amajorityoftheremainingAttestorscanattestthattheoldkeyisnolongervalidandcanalsoattestthatanewkeyisavalidattestationkey.
Normally,cryptographicsignaturesareverycomplicatedorverybrittle.Thissystem,however,ensuresthatthesystemcontinuestoworkevenifseveralkeyshavebeenlostorcompromised.Moreover,encryptionkeysarestoredonphysicaldevicesandprotectedbyapassword.Evenifajurisdiction’snetworkiscompromised,theirkeysarenot.
AuthenticatingaLibrary
AttestationsofaPublisher’sauthorityarestoredintheLibrary.Thus,whenaPublishersignsaLibrary,alltheinformationneededforauthenticationisavailablewithintheLibrary.ThistechniquecombinedwiththeuseofGittocreateacryptographicallysecuredhistoryandtocreateeasilyreplicablerepositoriesresultsinrobustauthenticationsystemforLibraries.
WhileaConsumerorHostingEntitycanconfirmthatallsignaturesandallattestationsarevalidbacktotheveryfirstreleaseofaLibrary,theywillalwaysrequireatleastoneout‐of‐bandauthentication(i.e.,authenticationviasomethingotherthantheoriginalreceivingchannel)toconfirmtheveryfirstrelease.ThedesignoftheOpenLawPlatformaimstodecreasethefrictionrequiredtoobtainout‐of‐bandauthentication.
Forstarters,onceaConsumerorHostingEntityhasperformedoneout‐of‐bandauthentication,usuallyviaatelephonecalltothePublishingEntity,theuseofGittostoreaLibrarymeansanyfutureupdatescanbeconfirmedauthenticwithoutexternalverification.Justaslawlibrariescurrentlyprovideindirectauthenticationofpaperlaws—theybuythelawsfromtheofficialpublisherthenrepresenttotheirusersthatthesearetheofficiallaws—lawlibrariescandownloadaLibraryfromtheofficialPublishingEntity,performthesingleout‐of‐bandauthentication,andthenrepresenttotheirpatronsthattheseareofficiallaws.
OnceaLibraryishostedbymorethanoneHostingEntity,itbecomespossibletoperformout‐of‐bandauthenticationbycomparingthevarioushostedLibraries.AndthiscomparisoncanthenbeautomatedforeaseofusebyConsumers.
Importantly,thissystemworkswithoutrelyingonapublicrootcertificate(likethoseunderlyingHTTPS)orawebservicemaintainedbythePublishingEntity.Ifthewebservicegoesdown,orthePublishingEntitystopssupportingthewebservice,theLibrarywillstillbefullyavailableandauthenticatablethroughtheconstellationofHostingEntities.Inrootcertificatebasedsystems,compromisingtherootcertificatemeanscompromisingallhistoricaldocumentssignedbythecertificate.Whileitmayseemunlikelythatarootcertwillbecompromised,thisissurprisinglycommon.Symantec,untilrecentlyoneofthemosttrustedrootcertificateauthorities,wasforcedbyGoogleandMozillatodivestitselfofitsrootcertificatein2017becauseofmajorsystemicsecurityviolations.Anauthenticationsystempremisedonapublicrootcertificatesystemistoofragiletoprovideauthenticationoverdecades.Instead,byintimatelytyingtheauthenticationmechanismtothepreservationmechanism,preservingthedocumentsautomaticallypreservestheauthentication.
ThediscussionuptothispointhasbeenregardingArchivalAuthentication,i.e.,downloadingandauthenticatinganentireLibrary(alongwithallhistoricalversions).Mostusers,suchaslawyersandjudicialstaffwillbeperformingTransientAuthenticationofparticularversionsofindividualDocuments.Forthesepurposes,aweb‐basedauthenticationserviceisideal,asitmakesittrivialforuserstoauthenticate.TheOpenLawPlatformisdesignedtoprovideanauthenticationservicethroughawebsite,anapplicationprogramminginterface(API),andpluginsforallmajorbrowsers.TheauthenticationservicewillcompareahashoftheDocumenttobeauthenticatedagainstthehashesofallversionsofallauthenticDocuments.TheauthenticationservicecanthereforenotonlytelltheuserifaDocumentisauthentic,butalsowhentheversioninquestionwascreatedandif/whenitwassupersededbya
26
newerversion.Unlikeotherwebauthenticationservices,theOpenLawPlatformoptionallyprovidesthefullcryptographicauditchainsoanindividualcanconfirmforthemselvesagainstafullcopyoftheLibrarythattheDocumentinquestionisauthentic.
Redundancy
RedundancyisbuiltintothesystembecauseofthewayrepositoriesarestoredusingGitandbecauseoftheauthenticationprocess.
Withrespecttoredundancyofinformation,thewideadoptionofGitandthevariouscommerciallyavailableGithostingsolutionsmeansthatanyoneatanytimecaneasilyretrieveandhosttheirowncopyofaLibrary.ThisreplicabilitymeansthatLibrariescanbequicklydistributedacrosslargegeographicareasandcanhelprecoverfromdataloss.Moreover,eachcopyofaLibraryiscryptographicallysignedinawaythatpermitsforcorruptiondetection.
Nolessimportantandconsiderablymorecomplexistheredundancyofauthentication.IfmanyHostingEntitiesareconstantlypullingdownupdatesoffullyauthenticatedlaws,theconstellationofentitiescanhelpaPublishingEntityrecoverfromcatastrophiclosses(suchasanaturaldisaster).IfallAttesterkeysarelostin,say,aflood,agroupofHostingEntitiescanrepresentthatanewsetofAttesterkeysareofficialkeys,helpingtorapidlybootstrapaPublishingEntitybacktoanauthenticatablestate.Moreover,thepresenceofverifiablyauthenticcopiesheldbyHostingEntitiesmeansthatanynewcopiescanbeauthenticatedagainstthosecopieseveniftheoriginalPublishingentitynolongerexists.
Overall Assessment TheinitialcostofdevelopingtheOpenLawPlatformwassignificant,butitisnowafullygeneralizedlegalpublishingplatformthatisavailableforanyjurisdictiontouse.FreeGitrepositoryhostingisavailablefromseveralwell‐establishedcommercialprovidersincludingGitHub,Bitbucket,andGitLab.AsofFebruary2018,version1.0oftheOpenLawPlatformiscompleteandrunningfortheDistrictofColumbia.DocumentspublishedusingtheOpenLawPlatformcanbefoundathttps://code.dccouncil.us,andXMLrepresentationsareavailableathttps://github.com/dccouncil/dc‐law‐xml.InitialworkonArchivalAuthenticationiscompleteandisbeingrolledouttotheCouncil;TransientAuthenticationisexpectedJune2018.
27
Appendix I: Survey Results Theseresultsarecompiledfromresponsesfrom20states
Whatlegalmaterialsdoesyourstatepublishonlineonly?
Whatlegalmaterialsdoesyourstatepublishbothinprintandonline?
28
Ofthematerialsidentifiedinquestions#1or#2,whichonlinematerialsaredeemedtobetheofficialversion?
Haveyoudigitizedanypaper‐basedlegalmaterials?
Yes‐68%;No‐32%
Ifso,doyouintendthedigitizedmaterialstobeconsideredofficial?
Yes‐19%;No‐62%;N/A‐19%
Doyouplantodigitizepaper‐basedlegalmaterialswithinthenext18months?
Yes‐29%;No‐67%;Maybe‐4%
29
Whatdigitizationprocessesareyouusing?
Forlegalmaterialsthatyouaredigitizing,whatisthefileformatyouareusing(e.g.pdf,xml,doc,tif,jpg,jp2000)?
Doyouintendtoimplementalong‐termpreservationstrategywithinthenext18months?
Yes‐50%;No‐32%;Maybe‐18%
30
Ifyoudonothavealong‐termpreservationstrategy,whatarethebarriersthatyouface?
Whatsortofresourceshasyourstateprovidedforlong‐termpreservation?
31
Howlikelywouldyoubetouseanopensource,out‐of‐the‐boxstrategyforpublicationandpreservationofelectroniclegalmaterials?
1=Verylikely
5=Unlikely
Issues:
Unlikelytoengageinpreservationofanykind Notsurewhatthiswouldbe Publictrust Alreadydevelopingownstrategy Supportandmaintenance Stayingwithprint
Howlikelywouldyoubetoparticipateinaninterstatedigitalstoragesolutionforofficialelectronicdocumentsifoneexisted?
1=Verylikely
5=Unlikely
Issues:
Unlikelytoengageinpreservationofanykind Resources Accessibility;Security Constitutionalmandates Stateleveloperation Self‐sufficient
32
Appendix II: Open source and commercial preservation systems Archive‐It (www.archive‐it.org) Archive‐Itisawebarchivingservicethathasbeenavailablesince2006fromtheInternetArchive.ThisisthesameorganizationthatisresponsiblefortheWaybackMachine(https://archive.org/web),whichhasbeenarchivingtheinternetsince1996.Archive‐ItusestheHeritrixwebcrawlerthatwasdevelopedbytheInternetArchive,andoutputsdataintheWARCfileformat,anISOstandardforwebarchiving.
Archive‐Itisasubscriptionservice,andisusedtoarchiveboththewebsitesofpartnerinstitutionsaswellastopicalcollectionsofwebsites.TheArchive‐ItTeamattheInternetArchivehasdevelopedalife‐cyclemodeltohelpguideinthedecision‐makingneededforawebarchivingprogram.(http://ait.blog.archive.org/files/2014/04/archiveit_life_cycle_model.pdf)
Features:
Controltheextent,depth,anddescriptionofcollections BrowsecollectionsbyURL,bymetadata,andbyfull‐textsearch. Publicaccessviaarchive‐it.organdtoolstobuildcustomintegratedportals Filestorageinpreservationformatinmultiple,redundantdatacenters. AbilitytodownloadWARCfileson‐demandforlocalmanagement.
33
Archivematica (https://wiki.archivematica.org) Archivematicaisanopen‐sourcepreservationapplicationsupportedbyArtefactualSystems(https://www.artefactual.com).Thecommunityissupportedusingadiscussionlist,usergroupmeetings,trainingandworkshops,andinstallationandserviceagreements,andArtefactualSystemshaspartnershipsthatprovideArchivematicaasahostedservice.Theevolutionofthesoftwarecanbetrackedonthedevelopmentroadmap.
Archivematicautilizesasetofmicro‐servicestoperformfundamentalpreservationactions.Thesystemcomprisesstandardandopentoolsthatotherservicesalsoutilize(foracurrentlist,seehttps://wiki.archivematica.org/Release_1.6.0),aswellasspecificpythonmiddlewarethattiesthetoolstogetherintoservicesandworkflows.Alistofthemicro‐servicesisavailableathttps://wiki.archivematica.org/Micro‐services#Archivematica_Micro‐services.
FromtheArchivematicaliterature:
“ThegoaloftheArchivematicaprojectistogivearchivistsandlibrarianswithlimitedtechnicalandfinancialcapacitythetools,methodologyandconfidencetobeginpreservingdigitalinformationtoday.TheprojecthasconductedathoroughOAISusecaseandprocessanalysistosynthesizethespecific,concretestepsthatmustbecarriedouttocomplywiththeOAISfunctionalmodelfromIngesttoAccess.Throughdeploymentexperiencesanduserfeedback,theprojecthasexpandedevenbeyondOAIStoaddressanalysisandarrangementoftransferreddigitalobjectsintoSIPsandallowforarchivalappraisalatmultipledecisionpoints.Whereverpossible,theserequirementsareassignedtosoftwaretoolswithintheArchivematicasystem.Ifitisnotpossibletoautomatethesestepsinthecurrentsystemiteration,theyareincorporatedanddocumentedintoamanualproceduretobecarriedoutbytheenduser.Thisensuresthattheentiresetofpreservationrequirementsisbeingcarriedout...Inshort,thesystemisconceptualizedasanintegratedwholeoftechnology,peopleandprocedures,notjustasetofsoftwaretools.ForinstitutionsthatwanttechnicalassistancetoinstallandcustomizeArchivematica,optionaltechnicalsupportservicesareprovidedbyArtefactualSystems.”
Arkivum (www.arkivum.com) ArkivumisaUK‐basedcompanythatoffersthreedistinctstoragesolutionsforenterpriserecords,datasets,andculturalheritageassetsrespectively.Arkivum’sdigitalassetmanagementandpreservationsystemiscalledPerpetua.Perpetuacanbeoperatedasacloud‐based,managedstoragestoragesolutionorasalocallymaintained,internalsystem.ItusesArchivematica’stoolsformetadatacreation,performsscheduleddataintegrityaudits,offersdataencryptionandaccesscontrols,andsupportsavarietyofbackupstorageoptionsincludingtape‐,cloud‐,anddisc‐basedsystems,allgeographicallydistributedbetweentheUnitedStatesandtheBritishIsles.Toalleviateuncertaintyarounditshostedpreservationmodel,Arkivumofferscontractualcommitmentstoservicethatexceed25yearsinduration.Theyalsobuildintoallserviceagreementsatransparentexitstrategyshouldinstitutionschoosetomigratetoanewsystem.Arkivumisstillarelativelynewcompany,havingjusttransitionedfromaninternalprojectattheUniversityofSouthamptontoaprivatebusinessmodelin2011.
Duraspace Systems (www.duraspace.org) Duraspaceisanon‐profitorganizationdevotedtoprovidinglong‐termsupportforDspace,Fedora,andVIVO.DuraspacealsosuppliesdigitalassetmanagementandpreservationservicesthroughitsDuraCloudandDspaceDirectplatforms:
DuraCloud (www.duracloud.org) DuraCloudisacloudstorageandcontentpreservationserviceofferedbyDuraspacethatbacksupassetstomultiplecloudstorageproviderswhilealsoofferingasuiteofpreservationtoolssuchasdataintegritychecks,transfertools,andscheduledsynchronization.DuraCloudismainlyastoragesolution
34
andmustbeintegratedwithanassetmanagementsystemlikeDspaceorArchive‐Itforthecaptureofpreservationmetadata.Clientsaregivenarangeofcloudstorageproviderstostoretheirassetswith.TheseincludeAmazonSimpleStorageService,AmazonGlacier,SanDiegoSupercomputerCenter,RackspaceCloudFiles,andChronopolis.PricingforDuraCloudvariesdependinguponwhichcloudservicestheclientdecidestouse.Costsincludeanannualsubscriptionfeebetween$1,235and$5,520andaperterabytestoragecostbetween$500/TBto$825/TB.Chronopolisstorageincludesanadditionalingestfeeof$310/TB.
DspaceDirect (www.dspacedirect.org) DspaceDirectisahostedDAMSservicewhereinaclientcancontractDuraspacetomaintainacloud‐basedinstanceofDspaceforafee.SincetheDspaceDirectsystemintegrateswithDuracloud,itispossibletouseaDspaceDirectrepositoryasapreservationsystem.DuracloudisanotherDuraspaceservicethatofferscloud‐basedarchivingandpreservationfunctionssuchaschecksums,fileredundancy,contentmigration,andaccesscontrol.PricingforDspaceDirectcanvarygreatlydependinguponextentofstorageneeded.Arelativelysmall250GBallotmentofstoragecosts,asofSeptember2017,is$8,670.
Fedora (fedorarepository.org) NottobeconfusedwithRedHat’sLinuxoperatingsystembythesamename,FEDORA,whichstandsforFlexibleExtensibleDigitalObjectRepositoryArchitecture,wasdevelopedbytheDigitalLibraryResearchGroupatCornellUniversityinthe1990s.Itisanopensourcerepositorysystem,whichofferstoolsformanagementanddisseminationofdigitalassets.Itisnotableforitsflexibilityandmodularity.Itcanbeconfiguredtoacceptanyfiletypeormetadataschema.Additionally,itsfeaturescanbecontrolledviaanextensivesetofAPIs,allowingforintegrationwithexternalapplicationsanddevices.Becauseofthismodularitypotential,Fedoraitselfoperatesasakindofskeletonplatform.Itoffersseveralcorefunctionslikestorage,relationalconnections,andbasicingest,butmostadvancedfunctionsthatonecomestoassociatewithatypicaldigitalassetmanagementsystem(DAMS)areintegratedintothesystemasadd‐ons.IslandoraandSamvera(seebelow)aretwosuchsuitesofsoftwaretoolsthatcanbeaddedontoFedoratofacilitatedigitalassetmanagementandpreservation.
Islandora (islandora.ca) OriginallydevelopedbyaffiliatesoftheUniversityofPrinceEdwardIsland,IslandoraisaDrupal‐basedframeworkofdigitalrepositorytoolsthatintegratewithFedora.Islandoraisanopensourceplatformwhosecore,components,anddocumentationaremaintainedbyagrowingcommunityofcontributorsfromaroundtheworld.Islandora’smaincomponentsincludeDrupalwebinterfacesforadministrativefunctionsandend‐userexperience,Solrsearchengineforassetdiscovery,andspecialcontentmodelsfordifferentassettypeslikepdfs,videofiles,largeimagefiles,etc.SpecificfeaturesareaddedtoIslandorausingDrupal’smoduleandthemesystems.TheIslandoracommunityhascreatedanumberofmodulesthatcarryoutpreservation‐relatedoperations.Someoftheseinclude:
IslandoraPathauto/IslandoraHandle/IslandoraDOIforimplementingpersistentURLs. IslandoraPREMISforsupportingproductionandstorageofpreservationmetadata. IslandoraFITS/IslandoraChecksum/IslandoraChecksumCheckerforcarringoutdataintegrityfunctions
likechecksumgeneration,fileformatidentification,andtechnicalmetadataextraction. IslandoraBagItfordepositingbackupassetstoaBagItpreservationarchive. IslandoraVaultforfordepositorybackupassetstoCloudSyncorDuraCloud. IslandoraLOCKSS‐O‐MaticfordepositingassetsintoaPrivateLOCKSSNetwork.
35
Perma.cc (https://perma.cc) Perma.ccisatoolbuiltbyHarvardUniversity’sLibraryInnovationLabtospecificallycombatlinkrotincitations.ItisanonlineservicethatwillarchivethewebpageforagivenURLandaddittothePerma.cccollectionandreturnauniqueURL(e.g.“perma.cc/ABCD‐1234”)thatpointstotherecordinthecollection.WhenthatURListhenusedinacitation,itwillgivereadersastableviewofthepageatthetimeitwasarchived(eveniftheoriginaldisappearsfromtheweb),aswellasalinktothepageasitcurrentlyexists.
Perma.ccisafreeservice,andanyonecancreateanaccount,butunlessassociatedwithavettedorganization,auserwillbelimitedtocreating10permalinkspermonth.OnceanorganizationhasjoinedPerma.cc,unlimiteduseraccountscanbecreatedandthoseuserscancreateunlimitedpermalinks,aslongasthelinksaresavedwithinthememberorganizationonPerma.cc.
Preservica (www.preservica.com) PreservicaisaprivatedigitalpreservationcompanythatoperatesoutofBostonandOxford.Itoffersservicesacrossthedigitalassetlifecycle,notjustforlongtermpreservation.ThePreservicaplatformisavailableasafullyhosted,cloud‐basedserviceorasanon‐premise,locallyhosted,customizableinstallation.PreservicarepositoryadherestoOAISISO14721preservationstandards.Itofferstoolsformetadatacreationandharvesting,simpleingestworkflow,multiplestoragechoices,largefiletransfer,accesscontrol,andactivefileformatidentificationandmigration.
Rosetta (www.exlibrisgroup.com/category/RosettaOverview) RosettaisadigitalassetmanagementandpreservationsystemproducedbyExLibristhatoffersfulllifecyclesupportforanydigitalformat.ThoughRosettaisproprietarysoftware,itexposespartsofitsarchitecturetothird‐partyintegrationwithAPIs.AdministratorscanconnectRosettatoaseparatestoragedeviceordevices,ifdesired.Rosetta’sworkflowmodelsareconfigurable.Itgenerateschecksums,identifiesfileformatsandextractstechnicalmetadataatingest.TheRosettapreservationplanningmoduleenablesadministratorstoscheduledataintegrityandmigrationtasksasneeded.RosettausesthePREMISdatamodelforcollectingpreservationmetadata.Thesystem’sarchitectureisdividedbetweenanoperationalrepository,wherefunctionslikepublishinganddeliveryarecarriedout,andapermanentrepository,wherepreservationfunctionsandlongtermstoragetakeplace.
Samvera (samvera.org) FormerlyknownasHydra,SamveraissimilartoIslandorainthatitintegrateswithFedorarepositorysoftwaretoprovidesearchengineandinterfacelayers;however,ratherthanusingDrupaltofacilitatethis,SamverausesaRubyonRailsplugincalledBlacklight.TheBlacklightframeworkisawebplatformthatisspecificallydesignedfor
36
resourcediscoverywithSolrIndexes.Samverahasbeenadoptedbyanumberofmajordigitallibrariesasadigitalassetmanagementsystem.In2014,theHydrausergroupdecidedtopursueoptionsforaddingdigitalpreservationfunctionalitiestotheHydra/Samverastack.AsofSeptember2017,aSamveraDigitalPreservationInterestGroupthatisinvestigatingthematter.
37
Appendix III: Stand‐alone Preservation Tools Asidefrompreservationsystemsthatstrivetoaccomplishanend‐to‐endOAIS‐compliantpreservationworkflow,thereareanumberofopensourcetoolsthatwillperformdifferentanddiscreetpartsoftheprocess.Herearesomethatareinwidespreaduse:
BagItDevelopedbytheLibraryofCongressandtheCaliforniaDigitalLibrarytodefineastrategyfortransferringdigitalcontent.Itspecifiestheelementsandstructureofa“bag”thatincludesthefilesinastandardcontainer.Thetoolwillcreateamanifestofchecksumsofallofthedigitalfilesinthatcontaineraswellasmetadataaboutthepackage.Onreceiptofthepackage,thechecksumscanbevalidatedtomakesurethatnocorruptionoccurredduringthetransfer.
ExampleofaBagItbag
DataAccessionerAsimpletoolfortransferringfilesfromonemediatoanother.IttoowillcreatechecksumsandgivestheusertheoptionofcreatingDublinCoremetadata.
38
Exactly
Anothertransfertool–fromthewebsite:“Exactlyallowsrecipientstocreatecustomizedmetadatatemplatesforsenderstofilloutbeforesubmission.Exactlycansendemailnotificationswithtransferdataandmanifestswhenfileshavebeendeliveredtothearchive.”
FITSThistoolconsolidatestheuseofotheropensourcetoolsforthepurposeoffilecharacterization.Thereareanumberofindividualtoolsthatwillidentifyafileformatandoutputvariouspiecesoftechnicalinformationaboutthatfile.Becauseeachindividualtoolhasstrengthsandweaknesses,itisgoodpracticetorunfilesthroughmultiplecharacterizationtools,andFITSwilldothisinoneprocessandoutputaconsolidatedsetofdata.
FromtheFITSonlineUserManual
39
FixityWhileseveralofthetoolsmentionedabovewillcreatechecksumsthatactassomethinglikeadigitalfingerprintofafile,withouttheabilitytovalidatethatchecksumperiodically,itservesnogoodpurpose.Fixityisatoolthatallowsfortheschedulingofregularchecksumvalidations,andwillsendareportontheresults.
Foranexhaustivelistofavailabletoolsandsystemsfordigitalpreservation,gotothePOWRRToolGridv2
POWRR–PreservingDigitalObjectsWithRestrictedResources