Preservation of Electronic Legal Materials · They reached out to the Georgetown University Law...

Preview:

Citation preview

PRESERVATION OF ELECTRONIC LEGAL 

MATERIALS 

UELMAPRESERVATIONGROUP

LeahPrescott,chairStevenAnderson

ErikR.BeckDianeBoyer‐VineDanielCordova

SusanDavidDeMaineAmyEmersonEmilyFeltrenDavidGreisenDavidHansen

JasonJudtJaneLarringtonMargaretMaesMichellePearseMendoraServinAnthonySmithLeslieStreetDavidWalls

Rev.4/2018

1

Contents IntroductionandBriefBackground............................................................................................................................................................2

Formationofadhocpreservationgroup.............................................................................................................................................2

Survey.................................................................................................................................................................................................................2

UELMAandElectronicLegalMaterial.......................................................................................................................................................3

DigitalPreservation...........................................................................................................................................................................................4

OAIS(OpenArchivalInformationSystem)FunctionalModel‐ISO14721............................................................................4

TrustedDigitalRepositories.....................................................................................................................................................................4

LevelsofPreservation.................................................................................................................................................................................5

DocumentStrategies.........................................................................................................................................................................................8

PDFStrategies.................................................................................................................................................................................................8

XMLStrategies................................................................................................................................................................................................8

MetadataforDigitalPreservation.............................................................................................................................................................10

PREMIS.............................................................................................................................................................................................................10

METS.................................................................................................................................................................................................................11

DigitalStorage....................................................................................................................................................................................................12

CloudStorage................................................................................................................................................................................................12

Amazon.......................................................................................................................................................................................................12

GoogleCloud.............................................................................................................................................................................................12

Localdigitalstorage....................................................................................................................................................................................12

CaseStudies........................................................................................................................................................................................................14

California.......................................................................................................................................................................................................14

Minnesota......................................................................................................................................................................................................19

Washington,D.C.........................................................................................................................................................................................22

AppendixI:SurveyResults...........................................................................................................................................................................27

AppendixII:Opensourceandcommercialpreservationsystems..............................................................................................32

Archive‐It(www.archive‐it.org)............................................................................................................................................................32

Archivematica(https://wiki.archivematica.org)...........................................................................................................................33

Arkivum(www.arkivum.com)...............................................................................................................................................................33

DuraspaceSystems(www.duraspace.org).......................................................................................................................................33

Islandora(islandora.ca)............................................................................................................................................................................34

Perma.cc(https://perma.cc)..................................................................................................................................................................35

Preservica(www.preservica.com).......................................................................................................................................................35

Rosetta(www.exlibrisgroup.com/category/RosettaOverview).............................................................................................35

Samvera(samvera.org).............................................................................................................................................................................35

AppendixIII:Stand‐alonePreservationTools.....................................................................................................................................37

2

Introduction and Brief Background 

Formation of ad hoc preservation group 

In2014,theWashingtonD.C.Counsel’sofficewaslookingforawaytopreservethesourcematerialsfortheCodeoftheDistrictofColumbia.TheyreachedouttotheGeorgetownUniversityLawLibrarytoinquireaboutbecomingamemberoftheChesapeakeDigitalPreservationGroup‐madeupofGeorgetownLawLibrary,thestatelibrariesofMarylandandVirginia,andHarvardLawLibrary‐aprogramcreatedtopreserveborndigitallegalmaterials.TheDCcodehadbeenconvertedtoXMLdocuments,andbecauseofthedynamicnatureofthatsystem,itwasdeterminedthattheChesapeakerepositorywasprobablynotthebestsolution.

WiththeparticipationoftheAALLGovernmentRelationsOffice,thisconversationtransformedintoasmallgroupofpeoplefromotherstateswhoweregrapplingwiththesameissueofhowbesttopreserveofficialelectroniclegalmaterials‐oneoftheprimaryrequirementsoftheUniformElectronicLegalMaterialAct(UELMA).

TheUELMAPreservationGroupagreedthatthemainpurposeofthegroupshouldbetoinvestigatepreservationstrategies,andprovidetoolsandassistancetothosestatesthathaveadoptedorplantoadoptUELMA.Itwasdeterminedthatdeliverablesforthegroupmightbe:

Guidanceonwhatdigitalpreservationentails,andestablishingguidelinesandbestpractices Tryingtoeducatepeopleoncostsrelatedtolevelsofpreservation Examplesoftechnicalpapersanddocumentsthathavebeendeveloped,documentationandsample

languagetohelpwithadvocacy Atoolkitandcasestudies

Survey Thegroupdeterminedthattherewasnotenoughinformationavailableaboutthestatusofstateelectroniclegalmaterials,anddecidedthatasurveycouldprovideusefuldata.Afterlocatingcontactsineachstate,thesurveywaslaunchedin2015,withthegoalofdeterminingthestateofpreservationactivitiesforelectroniclegalmaterialsingeneral,andtodeterminewhat(ifany)opentoolsmightbeusefulforthecommunityasawhole.

ThecompiledresultsofthesurveycanbeviewedinAppendixIofthispaper,andingeneralrevealedthatwhilelegalmaterialsarebeingcreateddigitally(borndigital)aswellasbeingdigitizedfrompaper‐basedmaterials,manyarenotyetconsideredofficial.Inaddition,thesurveysuggestedthatthereisnotastrongdesireforeitheraconsortialsolution,oranopensourcetool.Consequently,thegroupdecidedthatthebestdeliverableatthistimewouldbeaguidancedocumentintheformofawhitepaper‐thiswhitepaper.

 

3

UELMA and Electronic Legal Material ForthosewhoarenewtoUELMA,thefollowingsectionwillgivebasicinformationabouttheact.

TheAmericanAssociationofLawLibraries(AALL)maintainsawebpageofUELMAresources(https://www.aallnet.org/advocacy/government‐relations/state‐issues/uelma‐resources/),anditdescribesUELMAinthisway:

“TheUniformElectronicLegalMaterialAct(UELMA)isauniformlawthataddressesmanyoftheconcernsposedbythepublicationofstateprimarylegalmaterialonline.UELMAprovidesatechnology‐neutral,outcomes‐basedapproachtoensuringthatonlinestatelegalmaterialdeemedofficialwillbepreservedandwillbepermanentlyavailabletothepublicinunalteredform.”

TextoftheActRelatingtoPreservation

SECTION7.PRESERVATIONANDSECURITYOFLEGALMATERIALINOFFICIALELECTRONICRECORD.

(a)AnofficialpublisheroflegalmaterialinanelectronicrecordthatisorwasdesignatedasofficialunderSection4shallprovideforthepreservationandsecurityoftherecordinanelectronicformoraformthatisnotelectronic.

(b)Iflegalmaterialispreservedundersubsection(a)inanelectronicrecord,theofficialpublishershall:

(1)ensuretheintegrityoftherecord;

(2)provideforbackupanddisasterrecoveryoftherecord;and

(3)ensurethecontinuingusabilityofthematerial.

Intermsofwhat“electronic”and“legalmaterial”meansinthecontextofUELMA,theActprovidesthefollowingguidance:

(1)“Electronic”meansrelatingtotechnologyhavingelectrical,digital,magnetic,wireless,optical,electromagnetic,orsimilarcapabilities.

(2)“Legalmaterial”means,whetherornotineffect:

(A)stateconstitution

(B)sessionlaws

(C)statecode

(D)astateagencyrulethathasorhadtheeffectoflaw

(E)categoriesofstateadministrativeagencydecisions

(F)reporteddecisionsofstatecourts

(G)statecourtrules

(H)anyothercategoryoflegalmaterialtobeincludedbyindividualstates

Itisastrengthoftheactthatitdoesnotprescribeatechnologicalstrategyforelectronicdocuments,therebyallowingforafullrangeofsolutionstodealwithafullrangeofdigitalformattypes.Thatflexibilityisalsoachallengewhentryingtodevelopastandardsetofsolutionsforthewidestpossiblesetofusers.ThestrategiesusedbythestatesthathavethusfarenactedUELMAhowever,fallintoonlyafewcategories,andthispaperprovidescasestudiesfromsomeofthosestatesthatwillillustratetheirstrategies.

4

Digital Preservation Therearethosewhointerpretdigitalpreservationtobetheactofprotectingpaper‐basedmaterialsthroughdigitization,butthereisincreasingrecognitionthatadigitalobject‐regardlessofwhetheritwasborndigitalorcreatedthroughadigitizationprocess‐isanobjectseparatefromanyanalogform,andassuchhasaseparatesetofcapabilities,aseparatesetofpreservationchallenges,andanequalneedtobepreserved.

TheNationalDigitalStewardshipAlliance(NDSA)definesdigitalpreservationas“Theseriesofmanagedactivities,policies,strategiesandactionstoensuretheaccuraterenderingofdigitalcontentforaslongasnecessary,regardlessofthechallengesofmediafailureandtechnologicalchange.”

OAIS (Open Archival Information System) Functional Model ‐ ISO 14721 ManypreservationsystemsrefertotheOAISFunctionalModeltoframetherangeofcapabilitiesthattheyoffer,soitisusefultogiveabriefexplanationaboutwhatitis.OAISisatheoreticalmodelratherthananactualsystemstructureandassuch,itdescribesthebasicfunctionsthatacompliantsystemmustperform.GraphicssuchasthisonefromWikipediaareoftenusedtorepresentthemodel:

(https://en.wikipedia.org/wiki/Open_Archival_Information_System)

PreservationsystemdescriptionswilloftenrefertoSIPs,AIPs,andDIPsandtheseacronymscomefromtheOAISmodel.Theyreferto:

SIP‐SubmissionInformationPackage;theinformationcomingintothesystem AIP‐ArchivalInformationPackage;thearchivalobjectsanddatacreatedandpackagedfromtheSIP DIP‐DisseminationInformationPackages);theobjectanddatathatiscreatedfromtheAIPandmade

availablethroughanaccesssystem.

Trusted Digital Repositories Ina2002report,OCLC(agloballibrarycooperative)definesaTrustedDigitalRepository(TDR)as“onewhosemissionistoprovidereliable,long‐termaccesstomanageddigitalresourcestoitsdesignatedcommunity,nowandinthefuture.”ArelatedstandardthatappliestodigitalpreservationistheTrustworthyRepositoriesAudit&Certification(TRAC)checklist(ISO16363).ThepurposeofthischecklististodefinethecriteriaforcertificationofarepositorysystemasaTDR.

5

Themetricsofthechecklistaresplitintothreetopicalareas:

OrganizationalInfrastructure‐therepository'sadministrative,staffing,financial,andlegalfunctions DigitalObjectManagement‐thehandlingofdigitalobjectsfromingesttoaccess Technology,TechnicalInfrastructure,andSecurity‐thetechnologyusedtohandleingestedobjects

TheOCLCreportfurthersummarizestheresponsibilitiesofaTDR.ItstatesthataTDRmust:

Acceptresponsibilityforthelong‐termmaintenanceofdigitalresourcesonbehalfofitsdepositorsandforthebenefitofcurrentandfutureusers

Haveanorganizationalsystemthatsupportsnotonlylong‐termviabilityoftherepository,butalsothedigitalinformationforwhichithasresponsibility

Demonstratefiscalresponsibilityandsustainability Designitssystem(s)inaccordancewithcommonlyacceptedconventionsandstandardstoensurethe

ongoingmanagement,access,andsecurityofmaterialsdepositedwithinit Establishmethodologiesforsystemevaluationthatmeetcommunityexpectationsoftrustworthiness Bedependedupontocarryoutitslong‐termresponsibilitiestodepositorsandusersopenlyandexplicitly Havepolicies,practices,andperformancethatcanbeauditedandmeasured

WhiletheprocessofbecomingaTDRisquiterigorous,thecertificationchecklistisausefultooltohelpaninstitutionmoveintherightdirection‐evenifneveractuallybecomingcertified.

Levels of Preservation Digitalpreservationisdistinctlydifferentfrompreservationofanalogobjects.Whilepreservationofpaper‐basedmaterialsgenerallyrequiresestablishingastableenvironmentandthenminimizinginteractionwiththematerials,digitalpreservationrequirescyclicalandrelativelyfrequentinteractionwithobjectsbeingpreserved‐toperformfunctionslikefixitychecking,refreshing,andformatmigrationwhenneeded.ThisrealityalongwithstandardssuchastheTDRcertificationchecklistcanoftencreatetheimpressionthatdigitalpreservationisanunreachablegoalformanyinstitutions.

TheInfrastructureWorkingGroupoftheNDSAcreatedaset of tiered benchmarksfordigitalpreservationactivities(seebelow),andwhiletheselevelsprovideguidanceforinitiatingapreservationstrategy,theimplicitgoalistocontinuetomoveupthetiersasfarasispracticable.ItisthejudgmentofthepreservationgroupthatLevel3istheminimallevelforcompliancewiththeact.

6

Matrix Categories 

StorageandGeographicLocation

Thiscategoryaddressestheissueofhowmanycopiesofafileshouldbekept,withtheexpectationthatifoneshouldbethreatened,anothercouldbecopiedtoreplacetheloss.Includedinthisconsiderationisthephysicallocationofthedigitalfile,becausedifferentgeographicallocationswillhavedifferentthreats.Thefurtherthatcopiesarefromeachother,thelessriskthatasingleeventwilldestroyallcopies.TheLOCKSS Program(LotsOfCopiesKeepStuffSafe)basedatStanfordUniversityLibraries,isanexampleofapreservationstrategythatisbasedondistributedgeographiclocations.

FixityandDataIntegrity

ThePREMISDataDictionarydefinesfixitythisway‐“"informationusedtoverifywhetheranobjecthasbeenalteredinanundocumentedorunauthorizedway."Thisisprimarilydonewiththeuseofchecksums,whichactasa“fingerprint”ofadocument,andwhentheyarecalculatedperiodicallyandthencomparedwithearlierchecksums,itisapparentwhensomethinginthedocumenthaschanged,eitherthroughhumanintervention,orbecauseofspontaneousbitchanges(“bitrot”).Muchmoreinformationaboutfixitycanbefoundathttp://www.digitalpreservation.gov/documents/NDSA‐Fixity‐

7

Guidance‐Report‐final100214.pdf.Thiscategoryalsorelatestoprocessesusedtoextractdatafromobsoletemedia,aswellasensuringavirus‐freeenvironment.

InformationSecurity

Thiscategoryisforpoliciesandpracticesrelatingtowhohasaccesstofilesandwhattheirfile‐levelpermissionsare,aswellasdocumentationoffileaccess‐suchaswithatransactionlogthatrecordsaction,user,time,etc.TheDigitalRepositoryAuditMethodBasedonRiskAssessment(DRAMBORA),developedbytheDigitalCurationCentre,isamethodologytosupportthistypeofassessment,aswellasmanyothertypes.

Metadata

Thetypeofmetadatathatthiscategoryisconcernedwithisnotonlydescriptivemetadata,whichmostarefamiliarwith,butalsotechnicalmetadatasuchascaptureequipment,filemeasurements,suchasfilesize,resolution,pixeldimensions,forvisualfiles,ortimeforanaudiovisualfile.Ausefulpaperondifferenttypesofmetadatafordigitalpreservationpurposescanbefoundathttps://www.loc.gov/standards/premis/FE_Dappert_Enders_MetadataStds_isqv22no2.pdf

FileFormats

Somefileformatsareappropriateforlong‐termpreservationbecausetheyhavequalitiessuchasbeinguncompressedorlosslessintheircompression,theyhaveopenandavailablespecifications,theyarewidelyadopted,etc.ToviewwhatfileformatstheLibraryofCongressacceptsaspreservation‐worthy,andwhy,seehttps://www.loc.gov/preservation/digital/formats/intro/intro.shtml

 

 

 

 

 

    

8

Document Strategies Thetwomostcommonformatsforpreservation‐worthyelectroniclegalmaterialsarePDFandXML

PDF Strategies ThePortableDocumentFormat(PDF)isaverycommonspecificationusedforpresentingdocumentsonline,andisincreasinglyusedasanarchivalmasterformat(particularlyPDF/A).PDFwasdevelopedbyAdobeSystemsintheearly1990’sandthespecificationwasmadeavailableatnocostin1993,eventhoughitwasstillaproprietaryformat.In2008,thespecificationwasofficiallyreleasedasafullyopenandnon‐proprietarystandard,leadingthewayforitsuseasanarchivalmasterformat(althoughtherearealliedtechnologiessuchasPDFFormsthatarestillproprietary).SeveralstatesthathaveapprovedUELMAareusingPDFsastheirprimaryformatforofficialelectronicpublications.

PDF/Aisamulti‐partISOstandardforlong‐termarchivingformatforelectronicdocuments,basedonthePDFspecification.BecauseaPDF/Adocumentcontainseverythingitneedstopresentthedocument,includingembeddedfonts,itisintendedtobestableovertime,andcanbeusedonanyplatform.Inadditiontotheoriginalrelease(Part1),twoadditionalpartshavebeenmadeavailable,onein2011andanotherin2012.InPDF/AinaNutshell2.0,thePDFAssociationstates:

“Putinthesimplestpossibleterms,PDF/AisaPDFwhichforbidscertainfunctionswhichcouldhinderlong‐termarchiving.PDF/Aalsodemandsthatthefilemeetcertainrequirementswhichguaranteereliablereproduction.

Forexample,filesmustnotbeencryptedwithapassword,asallcontentmustalwaysbefullyavailable.Embeddedvideoandaudiodataarealsoprohibited:PDF/Aconsciouslyavoidsanythingthatrequiresexternalsoftwarefordisplayorplayback.JavaScriptandcertainactionsarealsoforbidden,asexecutingthemcouldpotentiallyalterthePDF.

PDF/Aalsoplaceshigherdemandsontheinformationitcontains.Allrequiredfonts(oratleastallglyphsforthespecificcharactersused)mustbeembeddedwithinthePDF.Toensureauniformcolourappearanceonavarietyofplatformsanddevices,colourinformationmustbegiveninaplatform‐independentformatusingICCcolourprofiles.ThesoftwaremustalsousetheXMPformatformetadata(whichisusedtostorethedataidentifyingthefileasaPDF/A,forexample).

PDF/Aalsosetstechnicallimits:forexample,thepagesizeislimitedtoanedgelengthofeither5.08metres(PDF/A‐1)orupto381kilometres(PDF/A‐2andPDF/A‐3).”

TherearemanytoolsavailabletocreatePDFandPDF/Adocuments,perhapsthemostobviousbeingthosebyAdobe,includingAdobeAcrobat.

XML Strategies ExtensibleMarkupLanguage(XML)isalanguageformarkingtextsimilarinsomewaystoHypertextMarkupLanguage(HTML),thelanguageofwebpages.ThedifferenceisthatHTMLprimarilydefineshowtextwillappearonawebpage,whereasXMLisdesignedtohelpdefinewhatdataactuallyis.AnotherdifferenceisthatHTMLfieldsarepredefinedandeveryoneusesthesametags,whereasXMLisdefineduniquelybythecommunitythatisutilizingitforadistinctpurpose.ThismakesXMLusefulforsharingdata.

AnexampleofaninstanceofXMLwithinthelegalrealmistheGlobalJusticeXMLDataModel(GJXDM).ItisdefinedbytheDepartmentofJusticeas“anXMLstandarddesignedspecificallyforcriminaljusticeinformationexchanges,providinglawenforcement,publicsafetyagencies,prosecutors,publicdefenders,andthejudicialbranchwithatooltoeffectivelysharedataandinformationinatimelymanner.”

9

AnotherexampleisLegalDocumentML,which“providesacommonlegaldocumentstandardforthespecificationofparliamentary,legislativeandjudicialdocuments,fortheirinterchangebetweeninstitutionsanywhereintheworldandforthecreationofacommondataandmetadatamodelthatallowsexperience,expertise,andtoolstobesharedandextendedbyallparticipatingpeers,courts,Parliaments,Assemblies,Congressesandadministrativebranchesofgovernments.”Thestandardaimstoprovideaformatforlong‐termstorageofandaccesstoparliamentary,legislativeandjudicialdocumentsthatallowssearch,interpretationandvisualizationofdocuments.”LegalDocumentMLispartoftheAkomaNtosospecification‐ExampleofXMLmarkupofalegislativedocument.

 

10

Metadata for Digital Preservation Bestpracticefordigitalpreservationincludesthecreationofdifferenttypesofmetadatathatwillhelptoensurelong‐termstewardshipofdigitalmaterials.Metadataforpreservationisinformationthatsupportsanddocumentstheprocessofdigitalpreservation,andinherentintheOAISmodel(seepg.4)istheconceptthatalloftheinformationpackages(SIP,AIP,andDIP)consistofthedigitalcontentthatis“packaged”alongwithdescriptiveinformation.

Atamoregranularlevel,thecontent(or“ContentInformation”inOAISlingo)mayincludethedigitalobjectalongwith“RepresentationInformation”(asalsodefinedintheOAISmodel).RepresentationInformationisdatathatisnecessarytointerpretorusethedigitalobject.Ifthedigitalobjectwereadataset,forinstance,theRepresentationInformationcouldpossiblybeinformationabouthowthedatawasgenerated,andwhatthestructureofthedatasetis,forexample.BoththedigitalobjectandtheRepresentationInformationmustbeequallypreservedasContentInformation.

The“PreservationDescriptionInformation”describeswhatisrequiredtopreservetheContentInformation,andmightincludeelementsofadministrative,structural,technical,orrightsmetadata.ThedefactointernationalstandardforpreservationmetadataisthePREservationMetadata:ImplementationStrategies(PREMIS)standard.

PREMIS FromthePREMISmaintenancepage,“ThePREMISDataDictionaryforPreservationMetadataistheinternationalstandardformetadatatosupportthepreservationofdigitalobjectsandensuretheirlong‐termusability.Developedbyaninternationalteamofexperts,PREMISisimplementedindigitalpreservationprojectsaroundtheworld,andsupportforPREMISisincorporatedintoanumberofcommercialandopen‐sourcedigitalpreservationtoolsandsystems.ThePREMISEditorialCommitteecoordinatesrevisionsandimplementationofthestandard,whichconsistsoftheDataDictionary,anXMLschema,andsupportingdocumentation.”

ThefullPREMISspecification,orDataDictionary,isquitelongandinvolved,butthereisamuchmoreaccessibledocumentcalledUnderstandingPREMIS,andhereispartoftheintroduction:

“…metadataiscategorizedaccordingtowhatitisintendedtoaccomplish:descriptivemetadatahelpsindiscoveryandidentificationofresources,administrativemetadatahelpsinmanagingandtrackingthem,andstructuralmetadataindicateshowcomplexdigitalobjectsareputtogethersothattheycanbeproperlyrendered.Similarly,preservationmetadatasupportsactivitiesintendedtoensurethelong‐termusabilityofadigitalresource…

Herearesomeexamplesofpreservationactivitiesandhowmetadatacansupportthem:

Aresourcemustbestoredsecurelysothatnobodycanmodifyitinadvertently(ormaliciously).Checksuminformationstoredasmetadatacanbeusedtotellifastoredfilehaschangedbetweentwopointsintime.

Filesmustbestoredonmediathatcanbereadbycurrentcomputers.Ifthemediaaredamagedorobsolete(likethe8"floppydisksusedinthe1970s)itcanbedifficultorimpossibletorecoverthedata.Metadatacansupportmediamanagementbyrecordingthetypeandageofstoragemediaandthedatesthatfileswerelastrefreshed.

Overlongperiodsoftimeevenpopularfileformatscanbecomeobsolete,meaningnocurrentapplicationscanrenderthem.Preservationmanagersmustemploypreservationstrategiestoensuretheresourcesremainusable.Thismightmeanmigratingoldformatstonewerequivalents,oremulatingtheoldrenderingenvironmentonnewerhardwareandsoftware.Bothmigrationandemulationstrategiesrequiremetadataabouttheoriginalfileformatsandthehardwareandsoftwareenvironmentssupportingthem.

11

Preservationstrategiesmayentailchangingoriginalresources(migration)orchanginghowtheyarerendered(emulation).Thiscanputtheauthenticityoftheresourceindoubt.Metadatacanhelpsupportauthenticitybydocumentingthedigitalprovenanceoftheresource‐‐itschainofcustodyandauthorizedchangehistory.”

ItfurtherdefinescategoriesthatareexcludedfromtheDataDictionary:

“TheDataDictionaryisnotintendedtodefineallpossiblepreservationmetadataelements,onlythosethatmostrepositorieswillneedtoknowmostofthetime.Severalcategoriesofmetadataareexcludedasoutofscope,including:

Format‐specificmetadata,i.e.,metadatathatpertainstoonlyonefileformatorclassofformatssuchasaudio,videoorvectorgraphics.

Implementation‐specificmetadataandbusinessrules,i.e.,metadatathatdescribesthepoliciesorpracticesofanindividualrepository,suchashowitprovidesaccesstomaterials.

Descriptivemetadata.Althoughresourcedescriptionisobviouslyrelevanttopreservation,manyindependentstandardscanbeusedforthispurpose,includingMARC,MODS,andDublinCore.

Detailedinformationaboutmediaorhardware.Again,althoughclearlyrelevanttopreservation,thismetadataislefttoothercommunitiestodefine.

Detailedinformationaboutagents(people,organizationsorsoftware)otherthanwhatisneededforidentification.

Extensiveinformationaboutrightsandpermissions;thefocusisonthosethataffectpreservationfunctions.”

METS OneofthestrategiesusedinPREMISistheconceptofextensionschemas.TheseareotherrelatedXMLschemasthatdefineelementsthatareusefulinPREMISbutdonotneedtoberedefinedaspartofPREMIS.OneoftheseextensionschemasistheMetadataEncodingandTransmissionStandard(METS).ThisisthemetadatastandardoftenusedforpackagingtheSIP,AIPorDIPforimportingorexporting.

ThemostcommonlyusedsectionsofaMETSrecordare:

Header–containsinformationabouttheMETSdocumentitselfsuchascreator. DescriptiveMetadata(dmdSec)‐usesextensionsalsotoutilizedescriptivemetadataschemessuchas

MARC,MODSandDublinCore.CanbeembeddedintheMETSrecordorpointtoexternalrecords. AdministrativeMetadata(amdSec)–informationabouthowfileswerecreated,rightsdata,

Masterfile/derivativeinformation,andmigrationdatacanberecordinginthissection.ThisalsocanberecordedwithintheMETSrecordorhavepointerstoexternalrecords.

Files(fileSec)–thisisalistingofallofthefilesthatcomprisethedigitalobject StructuralMap(structMap)–amandatoryMETSsectionthatoutlinesthehierarchicalstructureofthe

digitalobject.ItalsolinksthevariouselementswithinthestructMaptotheircorrespondingelementsinthefileSecordmdSec.Thisiscriticalfordigitalobjectsthataremadeupofmanyfilesbutrepresentasinglewhole,suchasapublicationthatmighthavehundredsoffilesthatneedtobearrangedinahierarchy(sections,chapters,etc.)withvariousdescriptivemetadatarecordsthatneedtoconnecttospecificplaceswithinthathierarchy.

12

Digital Storage  

Cloud Storage Storageofdigitalmaterialshaschangeddramaticallysincethelate1990’swhentheprimaryformoflong‐termstoragewasonlocalmediasuchasCDsandDVDsandmagnetictape.Todayitiscommonpracticetouse“spinningdisk”forstorage,eitheronlocaldrives,institutionalnetworkeddrives,orincreasinglyoncloudservices.AccordingtotheDigitalPreservationHandbookbytheDigitalPreservationCoalition(DPC),“cloudcomputing”is“atermthatencompassesawiderangeofusecasesandimplementationmodels.Inessence,acomputing‘cloud’isalargesharedpoolofcomputingresourcesincludingdatastorage.Whensomeoneneedsadditionalcomputingpower,theyaresimplyabletocheckthisoutofthepoolwithoutmuch(oftenany)manualeffortonthepartoftheITteam,whichreducescostsandsignificantlyshortensthetimeneededtostartusingnewcomputingresources.Mostofthese‘clouds’arerunonthepublicInternetbywell‐knowncompanieslikeAmazonandGoogle.”

Hereissomebasicinformationaboutthesecloudsolutions:

Amazon  Standard Simple Storage (S3) ‐ http://www.aws.amazon.com/s3 

Amazonprovidestwomethodsofcostfortheircloudstorage:ondemandorreservepricing.Ifthesizeofannualstorageneededisknown,prepaymentisanoption,whichcansaveupto75%onthecost.

OnDemandCost:

Upto50TBStorage 51‐100TBStorage 500TB+Storage

0.023GB/month 0.022GB/month 0.021GB/month

Amazon Glacier - www.aws.amazon.com/glacier o StandardInfrequentAccess(I/A)o Fromthewebsite‐ “Customerscanstoredataforaslittleas$0.004pergigabytepermonth,a

significantsavingscomparedtoon‐premisessolutions.Tokeepcostslowyetsuitableforvaryingretrievalneeds,AmazonGlacierprovidesthreeoptionsforaccesstoarchives,fromafewminutestoseveralhours.”

o DeveloperResources‐AmazonprovidesanAPIthatallowdeveloperstowriteinterfacestocloudstoragesystemsorusethirdpartysolutionsthatprovideuserinterfaces.

Google Cloud ● https://cloud.google.com/storage/archival/● April2018cost‐Capacitypricingis1centperGB/monthfordataatrestforNearlineand0.7centsper

GB/monthfordataatrestforColdline.

Local digital storage Thesizeoflocalharddrivescontinuestogrow,withat60terabyte(TB)solidstatedrive(SSD)announcedin2016.TapestoragealsocontinuestobeimprovedwithIBMandSonyworkingontechnologythatcouldpotentiallystore330TBsinasinglecartridgethattakelessspacethanaharddrive.

Localstoragehasgreatermanagementoverheado Mustbebackedupo Mostharddriveshaveanaveragelifespanofabout5years

13

o Easytooverwritesoprotectionmustbeputinplace,suchasWriteoncereadmany(WORM) FromWikipedia:“Writeoncereadmany(WORM)describesadatastoragedeviceinwhich

information,oncewritten,cannotbemodified.Thiswriteprotectionaffordstheassurancethatthedatacannotbetamperedwithonceitiswrittentothedevice.Onordinary(non‐WORM)datastoragedevices,thenumberoftimesdatacanbemodifiedislimitedonlybythelifespanofthedevice,asmodificationinvolvesphysicalchangesthatmaycauseweartothedevice.The"readmany"aspectisunremarkable,asmodernstoragedevicespermitunlimitedreadingofdataoncewritten.”

ViewtheMinnesotaCaseStudytoseehowWORMstorageisbeingusedforUELMApreservation.

 

14

Case Studies California Description In2012,throughSenateBill1075,CaliforniaenactedtheUniformElectronicLegalMaterialAct.InaddingArticle4(commencingwithSection10290)toChapter1ofPart2toDivision2ofTitle2oftheGovernmentCode,theLegislatureidentifiedtheLegislativeCounselBureauastheofficialpublisherforelectroniclegalmaterial.“Electronic”and“legalmaterial”isspecificallydefinedinSection10291oftheGovernmentCode.LegalmaterialisdefinedastheCaliforniaConstitution,thestatutesoftheStateofCalifornia,andtheCaliforniaCodes(hereafterreferredtoas“SB1075LegalMaterial”).

Undertheact,anofficialpublisherthatpublisheslegalmaterialinanelectronicrecordandalsopublishesinarecordotherthananelectronicrecordmaydesignatetheelectronicrecordasofficialifthepublisherauthenticatestheelectronicrecord,preservestherecord,andensuresthattherecordisreasonablyavailableforusebythepubliconapermanentbasis(Secs.10293,10294,10296,and10297,Gov.C.).TheLegislativeCounselBureaupublisheslegalmaterialbothinanelectronicrecordandinarecordotherthananelectronicrecordandhasdesignatedtheelectronicrecordasofficial.TheserecordsgenerallyoriginateintheLegislativeCounselBureauaslegislativemeasuresthatareeventuallyenactedintolaw.TheLegislativeCounselBureaualsoincorporateschangesinlawmadethroughtheinitiativeprocessintoitsdatabase.Theelectroniclegalmaterialisthenpublishedatwww.leginfo.legislature.ca.govinbothPDFandHTML.

InNovember2016,Californiavoters,throughtheinitiativeprocess,approvedProposition54.Aspartofthatinitiative,theLegislatureisrequiredtocauseaudiovisualrecordingstobemadeofallpubliclegislativeproceedingsandtomakethoserecordingsavailabletothepublicthroughtheInternetwithin24hoursaftertheproceedingshaverecessedoradjournedfortheday.TheLegislatureisalsorequiredtomaintainanarchiveoftheaudiovisualrecordings,whicharetobeaccessibletothepublicthroughtheInternetanddownloadableforaperiodofnolessthan20years(para.(2),subd.(c),Sec.7,Art.IV,Cal.Const.).Inaddition,Proposition54requirestheLegislativeCounselBureautomaketheaudiovisualrecordingsavailabletothepublic,witheachrecordingtoremainaccessibletothepublicthroughtheInternetanddownloadableforaminimumperiodof20yearsfollowingthedateonwhichtherecordingwasmade,andtoalsothenbearchivedinasecureformat(para.(6),subd.(a),Sec.10248,Gov.C.).TheLegislativeCounselBureauwillmaketheaudiovisualrecordingsavailableatwww.leginfo.legislature.ca.govandwillpreservetheserecordings.

Implementation considerations 

SB 1075 Legal Material 

IndevelopingpreservationpracticestheLegislativeCounselBureaumustmeettherequirementthattherecordisreasonablyavailableforusebythepubliconapermanentbasisasspecifiedinSenateBill1075,California’senactmentoftheUniformElectronicLegalMaterialAct.Inthatregard,theLegislativeCounselBureauhadthreemainconsiderationsinimplementingasolutionforpreservationofSB1075LegalMaterial:

1. Wouldthesolutionmeetthestandardsforlong‐termpreservation;2. Wouldthesolutionbecost‐effective;and3. Wouldnon‐technicalstaffbeabletousethesolution.

TheLegislativeCounselBureaualsowantedtomeetLevel3oftheNationalDigitalStewardshipAlliancelevelsofdigitalpreservation.Toreachthatgoal,theLegislativeCounselBureauwouldhaveto:

Storeatleastonecopyinageographiclocationwithadifferentdisasterthreat; Engageinamonitoringprocessforourstoragesystemsandmediatodetermineobsolescence; Checkfixityofcontentatdeterminedintervals; Maintainlogsoffixityinformation; Havetheabilitytodetectcorruptdata;

15

Virus‐checkallcontent; Maintainlogsofwhoperformedwhichactionsonfiles; Storestandardtechnicalanddescriptivemetadata;and Monitorfile‐formatobsolescenceissues.

Tomeettheaforementionedrequirements,theLegislativeCounselBureauconsideredthreeoptions:

1. Cloudstorageusingbothpreservation‐specificcloudsolutionsandgeneralcloudsolutions;2. Standardinternalstoragesystemswithstandardbackupsalreadyinuse;and3. OffsiteopticalWORM(writeoncereadmany)technology.

Afterconsideringtheadvantagesanddisadvantagesofeachofthethreeoptions,theLegislativeCounselBureaudecidedtousestandardinternalstorage.Inaddition,theLegislativeCounselBureauhasundertakenapilotprojecttopreservethelegalmaterialinapreservation‐specificcloudstoragesolution.

Audiovisual

TheLegislativeCounselBureauisworkingwiththeCaliforniaSenateandAssemblytomeettherequirementsofProposition54.Underthisproposition,audiovisualrecordingsmustbemadereadilyavailabletothepublic,inthedownloadableformat,foraperiodofnolessthan20years(audiovisualarchive),andstoredinasecureformat.Inaddition,theLegislativeCounselBureauisevaluatinghowtopreservetheaudiovisualarchive.Thegoalsofpreservationencompassthefollowing:

1. Theuseofmethodsandtechnologiestoensuredigitalcontentisusablebythepublic.2. Theuseofmethodsandtechnologiesthatmaintainthedigitalcontentasdigital‐contentstandardschange.3. Theprovisionofaperpetuallyaccuraterenderingoftheaudiovisualrecordings.Inthatregard,the

audiovisualrecordingswouldberetainedintheoriginalfileformatcreatedbytheaudiovisualinfrastructure.TheSenatecurrentlyusesafileformatknownas“LXF”or“LEGODigitalDesignerModelFiles,”whiletheAssemblyusesafileformatknownas“TS”or“TransportStream.”Thesefilesaretheoriginalsourcefilesfromwhichanyfuturefileconversionwouldbederivedtomeetnewdigitalfileformatstandards.

4. Thestorageofonecopyoftheaudiovisualrecordingsfileinageographiclocationwithadifferentdisasterthreat.

5. Theprovisionofsecureaccesstotheaudiovisualrecordingsfiletoensurethattherecordingsarenotmodified.

Solution BusinessProcessAdjustments

SB1075LegalMaterial

TheLegislativeCounselBureaudevelopedastrategicplanfortheauthenticationandpreservationofSB1075LegalMaterial,whichincluded:

1. Identifyinglegalmaterialstobepreserved,consistentwiththeUniformLawCommission’sversionoftheUniformElectronicLegalMaterialAct.(California’senactmentinSB1075coveredfewermaterials.);

2. IdentifyingunitswithintheLegislativeCounselBureauthatareresponsibleformaintainingthelegalmaterials;

3. Formalizingproceduresforauthenticationofthelegalmaterial;4. Establishingguidelinesforcost‐effectivereviewofpreservationneedsofdifferentlegalmaterialsona

cyclicalbasistomaintaindatafidelityandintegrity;and5. Formalizingupdateproceduresforpreservationpurposes.

16

GiventherequirementsanddefinitionssetforthinSenateBill1075,theLegislativeCounselBureaufocusedonthelegalmaterialdevelopedduringthelawmakingprocessrelatedtotheworkoftheLegislativeCounselBureauthatismadepubliclyavailable:thecodes,statutes,andconstitution.(Additionalmaterialgeneratedduringthelegislativeprocesshasbeenidentifiedaslegalmaterialthatshouldbepreserved.Butthatmaterialisnotpartofthecurrentauthenticationandpreservationstrategy.)Steps1‐3abovehavebeencompletedfortheSB1075LegalMaterial.ThesystemsthatareusedtodraftandpublishtheSB1075LegalMaterialwereadjustedsothattheLegislativeCounselBureaudidnotneedtochangeitsbusinessprocess.Instead,softwarehandlestheauthenticationprocessandprovidesforstorageofSB1075LegalMaterial.Also,softwarewasdevelopedsothatstaffcouldwritetothepreservationsystematscheduledintervals.Tocompletesteps4and5,theLegislativeCounselBureaumustdevelopauditproceduresfortheSB1075LegalMaterialandformalizeprocedurestoupdatethatmaterialandthetechnologythatisusedtoallowaccesstothepreservedmaterial.

Audiovisual

TheLegislativeCounselBureauiscurrentlyevaluatingbusinessprocesschangesinordertopreserveaudiovisualrecordingsunderProposition54.Oneconsiderationistheestablishmentofaprocessbywhichcurrentdigitalfilestandardsareassessed,perhapsonabiannualbasisfollowingthelegislativecycle,sincetheCaliforniaStateLegislaturehasmanyotherprocessesthatrevolvearoundthiscycle.Ifdigitalfilestandardschange,theLegislativeCounselBureauwouldbegintheprocessofconvertingtheoriginalsourceLXForTSfilestothenewstandard,replacingtheout‐of‐datestandard.

Anotheraspectofdigitalpreservationistheaccuraterenderingofauthenticatedcontent.Sincetheseaudiovisualrecordingsareintendedtoshowthepublichowthelegislativeprocessproducedbillsthatmaybecomelaw,theLegislativeCounselBureaumustensuretherecordingsarepresentedtothepublicwithoutmodification.

ForthispurposewithrespecttoSB1075LegalMaterial,theLegislativeCounselBureauusesAdobedigitalsignaturestoensurethatthedocumentshavenotbeenmodified.Thereisnosimilarsolutionforaudiovisualrecordingscurrentlyavailable.Onesolutioncouldbewriteonce,readmanytechnology.WORMdatastoragetechnologyallowsinformationtobewrittentoadiscasingletimeandpreventsthedrivefromerasingoreditingthedata.Theimplementationofthistechnologyforsecurelystored,publiclyaccessibleaudiovisualrecordingswouldineffectmakethemauthentic.

IT Design/Components 

SB1075LegalMaterial

TheLegislativeCounselBureauhasundertakenapilotprojectusingPreservica’scloud‐hosted,standards‐based(OAISISO14721)activepreservationsoftwareforpreservationofSB1075LegalMaterial.Thisweb‐baseddigitalpreservationapplicationprovidestheLegislativeCounselBureauwiththeabilitytostorefilesandperformpreservationtasks.Preservicaprovidessecureauthenticatedaccess,automaticallyclassifiesdocuments,andsetsaccesspermissionsduringingest.Preservicahasbuilt‐inworkflowsthatareusedbytheLegislativeCounselBureauforingestofdataandmetadatamanagement.UsingPreservica’sSubmissionInformationPacket(hereafterreferredtoas“SIP”)packagingdesktopclient,thePreservicaadministratoruploadscontentintoorganizedfilehierarchiesbasedonstatuteyearandCaliforniaCodesupdates,bothofwhichtakeplacetwiceayear.

ThedataisgeneratedbytheLegislativeCounselBureau’sLegalDivisionduringeachyearofthetwo‐yearlegislativesessionintheformofbillsthatareenrolledaspartofthenormallegislativebusinessprocessandsenttotheGovernorforaction.IfabilliseithersignedbytheGovernorortheGovernorletsitbecomelawunsigned,thesystemscreatethestatutesandauthenticatethedocumentsusingAdobecertificate‐baseddigitalauthentication.TheLegislativeCounselBureauPreservicaadministratorextractstheauthenticateddocumentandusesPreservica’sSIPclienttoloadintothecloudpreservationsite.Atthattime,Preservica’sSIPclientalsoaddsbasicdescriptivemetadata.ThatmetadatawasdevelopedasacollaborativeengagementbetweentheLegislativeCounselBureauandtheCaliforniaStateArchivesusingDublin‐CorestandardstomeettheneedsoftheLegislature.

17

Audiovisual

TheLegislativeCounselBureauisimplementinganEMCIsilonWORMstoragetechnologyforpreservationoftheaudiovisualrecordings.Isilonisascale‐out,network‐attachedstorageplatformofferedbyEMCCorporationforhigh‐volumestorage.ThissystemwillstoreboththeoriginalformatandamodifiedformatMP4fileforpublicaccessanddownloading.

Asalong‐termdatapreservationstrategy,theLegislativeCounselBureauwillstoretheaudiovisualdataatmorethanonesite.WithintheLegislativeCounselBureau’sprimarylocationinSacramento,anEMCIsilonstoragesystemwillbeimplementedconsistingofperformance‐optimizedstoragenodesandcapacitynodes.ThestrategytoprotectthedatafeaturesredundancyandwillplaceanotherEMCIsilonstoragesystematanestablishedoff‐sitelocation.Thevideodatawouldbereplicatedtotheoff‐sitelocationusinghigh‐bandwidth,secureandredundantpoint‐to‐pointwideareanetwork(WAN)connections.Ifanoutageoccurredattheprimarylocation,thevideodatacouldbeaccessedandrestoredfromtheoff‐sitelocation.

Licensingwillbepurchasedtoenablethewriteonce,readmanytechnologyavailablewithintheIsilonstoragearrayknownas“SmartLock.”TheIsilonSmartLocktechnologyprotectsthedataagainstaccidentalormaliciousdeletionsoralterations.ThistypeoftechnologyhelpsprotectdigitalfilesfrombeingmodifiedwhilethosefilesresidewithintheSmartLock‐enabledfiledirectory.Withthistechnology,anyvideomadeavailableontheInternetwouldbeprotectedfromalteration,ensuringthevideofile’sintegrity.Thus,thetechnologywouldsatisfydata‐authenticitycriteriawithindigitalpreservation,tomeettherequirementsofProposition54thattheaudiovisualrecordingsbeinasecureformat.

Costs SB1075LegalMaterial

TheLegislativeCounselBureaudecidedtousethecurrentbusinessandtechnicalprocessesforpreservationofSB1075LegalMaterial.Therefore,developmentcostswereabsorbedintothestandarddevelopmentbudget.

TheLegislativeCounselBureaudoesnotallowpublicaccesstothePreservicaarchive.Thus,theCloudEditionStartersubscription–upto250GBatacostof$4,000peryear–meetstheBureau’scurrentneeds.ThePreservicasystemwillallowtheLegislativeCounselBureautoscaleup,asstorageneedsincrease.

Audiovisual

TheLegislativeCounselBureauestimatestheaveragecostofthestoragesolutionforaudiovisualrecordings,whichisrequiredtostoretheoriginalfileformat(LXF),willbe$250,000peryear.ThisincludesthebackupWORMsolutionrecommendedforpreservingtheaudiovisualdata.

Our Current Assessment SB1075LegalMaterial

TheLegislativeCounselBureaufollowedastructuredsystemsdevelopmentlifecycleindesigningthearchivingprocess,inordertomeetthepreservationrequirementsoftheUniformElectronicLegalMaterialAct,asenactedinCalifornia.InformationtechnologystafffromtheLegislativeCounselBureaumetwithsubject‐matterexpertstounderstandthelegalmaterialthatrequiredarchivingforpreservationpurposes.Thisincludedunderstandingwhatmetadatatheownersofthelegalmaterialconsideredmeaningful.Questionswereaskedsuchas:Whatwasthebesttimingtocapturethelegalmaterial?Howwouldconsumersofthelegalmateriallikelyidentifythematerialinasearch?

ITstaffalsowroterequirements,includingusecases,forprocedurestoextractfiles,createmetadata,andstagethefilesinpreparationforextractiontoacloud‐basedarchivingplatform.

18

Aspreviouslystated,adeterminationwasmadetouseanindustrystandard(DublinCore)indeterminingthemetadatatobecapturedforthelegalmaterial.TheuseofthisstandardallowedeasyintegrationwithPreservica’singestworkflow.Metadataiseasilydiscoverableforarchivedfiles.ThisstandardallowedtheLegislativeCounselBureautoshareinformationwithStateArchives.

TheLegislativeCounselBureaustruggledwithhowtohandleversioningofthelegalmaterial.Notalllegalmaterialhadthesamerequirementsforversioning.TheITstaffdecidedtocaptureasnapshotofthelegalmaterialonacertaindateandtime,afterworkingwiththeownersofthelegalmaterialtodeterminethatdateandtime.TheLegislativeCounselBureauingestsandidentifiesasnapshotofthelegalmaterialbasedonthedatethematerialwasextractedfromthedocumentrepository.

TheLegislativeCounselBureauisalsocurrentlymanuallyinitiatingthearchivingprocess.WeneedtofindwaystoleveragetheworkflowavailableinPreservicatoautomatetheextractionandingestprocess,therebytakingadvantageofthededicateddocumentrepositorythatutilizesdataservicestomakethelegalmaterialavailable.Thisprocesswouldallowforastandardinterfacetoallthecurrentandfuturedocumentsthatneedtobepreserved.Inturn,theprocesswillmakeextractionoflegalmaterialeasier.

AnotheradvantageofthecurrentarchivingsolutionisthatmetadataisseamlesslyintegratedwiththelegalmaterialasitisingestedintoPreservica.Thismakesiteasytodiscovermetadatawhenviewingthelegalmaterialinthepreservationtool.

Thereareareasthatneedfurtherattention,includinghowtosecurelysharenecessarydocumentsandmetadatawithStateArchives.ThoughtheLegislativeCounselBureauandStateArchivesusethesamearchivingtool,therearesomeredundanciesineffortwhenbothentitiespreservethesamedocuments.

TheLegislativeCounselBureauhasmadethearchivingeffortforpreservationundertheUniformElectronicLegalMaterialActanongoinginternalproject.Thatwillallowtheprojectteamtofindwaystoimprovetheprocessesandprocedures,particularlyifadditionaltypesoflegalmaterialareaddedtotheproject.Theprojectteammustcommunicatewiththeownersofthelegalmaterialandexternalcustomerstoensurethattheirrequirementscontinuetobeincorporatedintothearchivingsolution.

Audiovisual

TheLegislativeCounselBureauistooearlyintheimplementationoftherequirementsofProposition54regardingaudiovisualrecordingstoassesssolutions.

 Notes Thisreportisforeducationalpurposes.Referencestoanyspecificcommercialproducts,process,service,manufacturer,company,ortrademarkdonotconstituteanendorsementorrecommendation.

    

19

 

Minnesota 

DanielKruseSystemsAnalyst/ProgrammerJasonDuffingSystemsAnalyst/ProgrammerJasonJudtDataSystemsProjectManager

TheMinnesotaOfficeoftheRevisorofStatuteshasconstructedKEEPS;acustomsoftwaresolutiontosatisfytherequirementsforpreservationandsecuritydetailedintheUniformElectronicLegalMaterialAct(UELMA).TheKeepElectronicEdictsPreserved&Secure(KEEPS)systemisinthetestingphaseandisscheduledfordeploymentin2016Q4.UELMA

Background InMinnesota,UELMAwasenactedin2013asMinnesotaStatutechapter3E.UELMAestablishesanoutcomes‐based,technology‐neutralframeworkforprovidingonlinedigitallegalmaterialwiththesameleveloftrustworthinesstraditionallyprovidedbypaperpublications.TheActrequiresthatofficialelectroniclegalmaterialbe:(1)authenticatable;(2)preserved,eitherinelectronicorprintform;and(3)accessible.TheKEEPSsolutionwasspecificallydesignedtosatisfyrequirement(2).

TheUELMArequirementsforpreservationandsecurityareinsection3E.07.Section7statesthatifofficiallegalmaterialispreservedinanelectronicrecord,theofficialpublishershall:

(1) ensuretheintegrityoftherecord;(2) provideforbackupanddisasterrecoveryoftherecord;and(3) ensurethecontinuingusabilityofthematerial.

System Description TheKEEPSsystem'sprimarygoalistoensuretheintegrityofofficialelectronicrecords.Thesystemmakesbackupanddisasterrecoverypossibleinseveralways.Theuseofwriteoncereadmany(WORM)diskdrivesandoffsitetapebackupscreatedfromseparatedocumentrepositoriesensuresthecontinuingavailabilityandusabilityofthematerial.

Thesoftwaresystemwasbuiltin‐houseusingstaffprogrammersandexistingcommercialproducts(Figure1).Theseproductsare:avirtualmachine,writeoncereadmany(WORM)disk,arelationaldatabase,andatapebackupapplication.Additionally,acustomsoftwareapplicationwasdeployedtothevirtualmachine.

Figure1–SystemDiagram

20

KEEPSintegrateswiththelegislativepublishingsystem,abackenddatabase,andaprivateintranet.TheKEEPSserverhasexclusiveaccesstotheWORMDisk,andreadwriteaccesstothedatabase.Thewebhasreadonlyaccesstothedatabaseforthepurposeofmonitoringthesystem.

Writeoncereadmany(WORM)diskdrivesareanintegralpartofthissystem.WORMdisksareessentialtoensuringtheintegrityofthedataandarethefoundationaroundwhichthissystemwasdesigned.TheKEEPSsolutionwillleverageGreenTecWORMStorageServersasthehardwarebestsuitedforthissystem.Thesestoragedevicesenforcewrite‐oncecapabilitiesthroughhardwaremechanismsratherthansoftwarerunningonacomputer.

Development ThebasicrequirementsfortheKEEPSsystemcanbesummarizedas:

Preservenewlypublisheddocuments. Catchanyerrorsinourpubliclyavailablelegalmaterial. Runindependentlyofourothersoftwarewithoutuserintervention.

KEEPSSoftwarewaswrittenusingJavaSE8consistingofthreeprimarymodules(Figure2):anArchiverthatwritespublishedUELMA‐compliantdocumentstotheWORMDisk,aWORMContentsPopulatorthatrecordsWORMdiskcontentsinadatabasetablewhichisusedinthevalidationprocess,andaValidatorthatworkswiththedatabasetovalidatethepubliclyavailablelegalmaterialandpopulatestheValidationErrorstable.Eachofthemodulesrecordsitsactivityinadatabasetablethatcanthenbeseenonanintranetpage.ThesoftwareisbuiltanddeployedusingtheApacheAntandIvyprojects.ThemodulescanberunatpredeterminedintervalsorondemandandaresynchronizedbytheScheduleManager.TheWORMdiskdocumentsandpublicUELMAdocumentsarebacked‐upseparatelytotape.Thetapesarestoredoffsite.

Figure2–KEEPSArchitecture

21

Testing TestsoftheidentifiedscenariosthatconstitutedcorruptionorfailuresofthepublicUELMAdocumentssystemwereperformed.Testscovered:(1)anunauthorizeddocumentinsertedintotheproductiondatabase,(2)adocumentremovedfromtheproductiondatabase,(3)changesofanytypetoanexistingdocumentintheproductiondatabase,and(4)theinabilitytoarchiveadocumenttotheWORMdisk.Inallcases,ourvalidationsystemcorrectlyidentifiedtheissueandreportedit.Deploymentiseffortlessandhasbeenrepeatedmanytimestoensuretherobustnessofthesoftwareandtheeaseofinstallation.

Loadtestingwasconductedonavirtualserverwith4GBofmemoryrunningWindowsServer2012R2.Forarchivetesting,51,463MinnesotaStatutesections,intheformofPDF,totaling6.3GBswerepublished.Totalarchivaltimewas37minutes.Validationtestingoccursdailyon606,105PDFdocumentstotaling65GB.Dailyvalidationcompletesinunder40minutes.Allloadtestsareconsideredsuccessfulinourenvironment.

ThesystemisstableandhasbeenrunningwithoutprogrammerinvolvementsinceMarch23,2016.Ithandleserrorsgracefullyandcontinuesprocessing,providingdetailedlogsthatcanbeusedtotroubleshootissues.

Schedule 

Prototype,2014‐2015Thefirstprototypewasdevelopedasaproofofconcept,byStephenSegalthePrincipalatSystemSpecialties,Minneapolis,MN.WritteninPHPitprovidedthebasisforgoodestimatesofthetimerequiredtovalidateourentirerepositoryoflegalmaterialonanightlybasis.

Build&Test,January–March2016Thesystemwasfunctionallytestedasitwasdeveloped,andreleasedtolong‐termtesting.

Testing,April–September2016Thesystemisstableandrunninginasimulatedproductionenvironment.

FinalDeployment,October2016○ Productionenvironmentwillbecompleted.○ GreenTecWORMStorageServerswillbepurchased.○ BulkArchiverwillwriteallexistingUELMAdocumentstotheWORMDisk.○ ArchiverwillwritenewUELMAdocumentstoWORMDisk.○ Validatorwillrundaily.

   

22

Washington, D.C. 

DavidGreisen‐OpenLawLibrary

VincentChuang‐OpenLawLibrary

TheOpenLawPlatformisasoftwaresystemcreatedforthepurposeofpublishinglaws,codes,legalinterpretations,andanyotherlegaldocumentproducedbyagovernment.Aspartoftakingadigital‐firstapproachtolegalpublishing,theOpenLawPlatformincorporatesUELMAcomplianceasacorecomponentoftheplatform.

TheCounciloftheDistrictofColumbiaisusingtheOpenLawPlatformtopublishitslawsandcode(https://code.dccouncil.us)andprovidesacasestudyforreplicatingkeyfeaturesandprocessesatotherjurisdictions.XMLrepresentationsoftheDistrict’slawsandcodescanbefoundathttps://github.com/dccouncil/dc‐law‐xml.

ThePlatform’sversionofUELMAcomplianceismodeledonbrick‐and‐mortarlibraries.Lessonsaboutreadabilityovertime,informationredundancy,versionhistory,andauthenticationhavebeenlearnedovercenturiesinthephysicalworld.AnditisusefultoapplymanyofthoseideaswhenconsideringdigitalpreservationandauthenticationunderUELMA.

Terminology TheCounciloftheDistrictofColumbiaisaPublishingEntity.AsaPublishingEntity,theCouncilisresponsibleforpublishingandauthenticatingtheLibraryofofficiallegalmaterialsrelevanttoitself.TheCouncil’sLibrarycontainsvariousDocuments,includingrapidlychangingdocuments,liketheentireDistrictofColumbiaCode,andstaticdocumentslikeindividuallaws.AnotherPublishingEntitycouldbetheExecutiveOfficeoftheMayor,anditsLibrarycouldincludeDocumentssuchastheDCMunicipalRegulationsandtheDCRegister.

AnimportantdifferencebetweenaLibraryintheOpenLawPlatformandabrick‐and‐mortarlibraryarisesinthecontextoftime.Thecontentsofaphysicallibrarymightchangeovertime,butyoucanonlyevervisitthelibraryasitistoday.Thatistosay,ifHarvardLawLibrarythrowsawayitscopyofAWrinkleinTime,thelibraryisstilltheHarvardLawLibrary,butyoucannevertravelbackintimetoreadAWrinkleinTimethere.AnOpenLawPlatformLibraryconsistsofasnapshotofeveryversionofthelibraryasithaseverexisted.Forinstance,onJanuary1,2018,theLibraryDC‐Law‐XMLmayhavecontainedonethousandlaws.WewouldrefertothatLibraryasDC‐Law‐XMLasofJanuary1,2018.OnJanuary2,2018,theCouncilmaypassanewlawandaddittoDC‐Law‐XML.Unlikeatraditionallibrary,youcanvisittheLibraryasitexistedonJanuary1orasitexistedonJanuary2.

AConsumer,likeacitizenoftheDistrict,canviewDocumentswithinaLibraryordownloadtheentirelibrary.AndHostingEntities,suchaslawlibraries,candownloadandhostacopyofanentireofficialLibrary.Forinstance,iftheHarvardLawLibrarywishedtohostanauthenticatablecopyoftheCouncil’sLibraryontheHarvardLawLibrarywebsite,itcoulddoso,justasitcouldpurchaseandhostanofficialpapercopyoftheDistrictofColumbiaCode.

Considerations InadditiontoUELMAitself,theOpenLawPlatformwasdesignedwithseveralrelatedandoverlappingconsiderations.

Time

Legaldocumentshavealonghistory,andthathistoryisitselfsubstantivelyvaluable.Asaresult,theOpenLawPlatformiscreatedwiththeintentionthateveryversionofthecontentitpublishesbeaccessibleandauthenticatablelongintothefuture.Andbecauselegalhistoryislong,thismeanscapturingandmaintaininglargevolumesofdocuments.TheCouncil’sLibraryisonlytwoyearsold, yetcontainsmorethanthirtythousand pagesoflawsandcode,andisgrowingbyoverfivethousandpagesannually.Librariesmustbemanageable,usableandresponsiveevenwhilecontainingordersofmagnitudemoredatathantraditionallibraries.

23

Authentication

TheauthenticationschememustalsoberobustagainstawiderangeoffactorsfromtheperspectiveofPublishingEntities,HostingEntities,andConsumers.

PublishingEntitiesaregovernments,andgovernmentsvarywidelyinthenumberofpersonnel,institutionalcapacity,andorganizationalstructure.Theauthenticationprocessmustbeusableinthesevaryingenvironments.Itmustbepossible,foranygovernment,toclearly,easily,andsecurelyconvey(1)whenadocumentwaspublished,(2)whopublishedit,and(3)theauthorityofthatpersontodoso.Allthreequestionscanbeansweredwithanappropriatelydesignedcryptographicsigningframework.

Inordertoberobustovertime,theframeworkmustberesilienttothelossorcompromiseofprivatecryptographickeys.Thesystemmustalsoprovideforrestorationintheeventofagovernment‐scalecatastrophe:theremustbeamechanismforrestoringaLibraryafterallencryptionkeyshavebeenlost.Andthesystemmustoperateongovernmenttimescales.Becausepublisheddocumentsareintendedtobeusedoverthecourseofdecades,accessibility(bywayofreadabilityorcryptographicscheme)mustkeeppacewithchangingtechnology.ALibrarymustbeaccessibleandauthenticatablelongafterthePublishingEntityhasabandoneditandmovedontoothertechnologies,justasanofficialpapercopyisatalawlibraryevenifthegovernmentnolongerhasthatparticularversion.

AHostingProvidershouldbeabletohostauthenticatableversionsofaLibraryforitspatrons.FortheConsumer,aLibrarymustfunctionacrosseveryusecase.Insituationsinwhichthedeliverynetworkiscompromised(suchashackerstakingoverthePublishingEntity’swebserver),aConsumermuststillhaveconfidencethattheLibrarybeingviewedisauthentic.Aswithphysicaltext,aLibraryshouldbeaccessibleandauthenticatableevenwithoutaninternetconnection.BecauseLibrarieshaveaversioncomponent,aConsumershouldbeabletoascertaininformationregardingbothauthenticityandversioninginformation,akintocheckingpublicationinformationinsideabook.

Redundancy

Thesystemmustalsobedistributed.JustasHarvardLawLibraryandUSCLawLibrarymaybothcarryacopyofAWrinkleinTime,aConsumershouldbeabletoaccessaLibraryfromaHostingEntityandbeabletoconfirmthattheLibraryisthesameasoneacquiredfromtheoriginalPublishingEntity.EvenifaConsumercanneveraccesstheoriginalLibraryfromtheoriginalPublishingEntity,thehostedLibraryshouldbeauthenticatablewithoutreferencetotheoriginal.

The Open Law Platform Solution Withthesevariousconsiderationsinmind,theOpenLawPlatformutilizesacombinationoftechnologies,includingXML,Git,andstrongencryption,toimplementasetofauthenticationtechniques.

XML

TheOpenLawPlatformstoresalmostalldocumentsasplaintextXML.Byusingplaintextinsteadofabinaryformat(e.g.,PDF),aLibraryanditsDocumentsarevirtuallyguaranteedtobereadablefordecadestocomewithoutrelyingonlegacysoftware.Plaintextalsorequiresconsiderablylessstoragespacethanbinaryformats.FortheCouncil,30,000pagesofXMLcanbestoredin100megabytes,whileonly10,000pagesofPDFsrequirefiftytimesthespacewhencompressedand500timesthespacewhenuncompressed.ThisdifferencemeansitisfeasibletostoreeveryversionofaplaintextdocumentinlessspacethanasingleversioninPDF.

XMLalsohastheadvantageofbeingabletostorethestructureofadocument,insteadofjustpresentationinformation(i.e.,howsomethinglooksonascreen).Thismeansdocumentscanbeconvertedintoanydisplayformatinthefutureandnotbetiedtoanyspecificsoftware.Together,thesebenefitsofXMLmakeitpossibletosatisfytheneedforusabilityovertime,abilitytostorelargeamountsofhistoricalinformation,andspeedofuse.

24

AcommonconcernwithXML‐basedsolutionsisthatXMLcanappearcomplicatedandrequiresadifferentsetoftoolsthanmostlawyersareusedtousing.ThishasresultedinveryfewUELMA‐compliantXMLimplementations.

TheOpenLawPlatformsolvesthisprobleminseveralways.First,theplatformfocusesonmakingtheXMLverycleanandsimple,using,wheneverpossible,ajurisdiction’sterminologytodescribeadocumentanditscontents(e.g.,Title,Chapter,Subchapter,andSection).Theplatformalsostoresmetadatalogicallywithinthedocument,againusingthesameterminologyasthejurisdiction.

Goodtooling(i.e.,softwareforviewingandeditingtheXML)alsogoesalongwaytomakingXMLmoredigestible.TheplatformprovidesamixofcustomXMLschemasandsoftwaretoensureXMLaccuracy,aswellasautomaticerrordetection,andothersmarteditingcapabilities.Byfocusingonuserexperience,lawyersfamiliarwiththeDistrict’slawsandcodewereabletonavigateandunderstandXMLrepresentationsoflawandcodewithnotraining.

ConvertingdocumentsintoXMLisitselfaprocess.Butagain,goodtoolingcanmaketheprocessfeelseamless.TheOpenLawPlatformincludesOpenLawDraft,aMicrosoftWordpluginthathelpsdraftersconformtotheirjurisdiction’sstyleguides.Oncethedocumentconformstothestyleguide,DraftcanturnthedocumentintocorrectXMLwithoutuserinput.

AnXML‐basedsolutionhasmanybenefitsinherentinitsformat,withthebiggestbarriersbeingusabilityandconversionofexistingdocumentsintotheformat.Afocusonuserexperienceandgoodtoolingcanovercomethesehighhurdles.SuccessonthisfrontrevealsthedownstreambenefitsofXMLthatultimatelyoutweightheinitialcosts.

Git

TheOpenLawPlatformstoresXML(andanystaticPDFs)usingtheopensourceGitdistributedversioncontrolsystem(https://git‐scm.com/).Insimplestterms,Gitisapieceofsoftwarethatkeepstrackofchangestooneormorefiles(eachgroupofoneormorefilescollectivelyreferredtoasa“repository”),recordsthedifferencesbetweennewandoldversionsofoneormorefiles,andmaintainsahistoryofthedifferences.Itdoesso,inpart,byprovidingtheabilitytosigneachversionwithauniquecryptographickey(https://git‐scm.com/book/id/v2/Git‐Tools‐Signing‐Your‐Work).Thismakesitpossibletopreservedifferentversionsofdocumentsastheychangeandcreatesanimmutablechainofauthenticatableversionsbacktotheoriginal.

GitmakesiteasytocopyanentireLibraryfromoneplacetoanotherandthenkeepthecopyup‐to‐datewiththeoriginalbyjustsyncingchanges.BecauseeverycopyofaLibraryhasallthehistoricalinformationandauthenticationinformationoftheoriginal,itisinherentlyfraudresistant.IntheeventamaliciousactorattemptedtomodifythehistoryoftheoriginalLibrary,thenexttimeacopyattemptedtosyncwiththenow‐fraudulently‐modified“original”,thecopywoulddetectthemodificationofthehistoryandrejectthefraudulenthistory.

Gitisfree,opensource,andavailableonvirtuallyeveryplatform.TherearealsomanycloudservicesthatprovideGitaccess.Becauseofthiswideavailability,aLibrarythatisstoredasaGitrepositorycanhaveallofitshistoricalinformationhostedonavarietyofphysicalmachineslocatedacrossalargegeographicregion.Andeverycopyiseasilyauthenticated.

SigningaLibrary

WithXMLandGitastheunderlyingtechnologies,theOpenLawPlatformimplementsspecificprocessestoachievetheneededauthenticationoutcomes.

Atthegovernmentlevel,eachemployeewhohasauthoritytopublishlegaldocumentsreceivesasmartcard(e.g.,https://www.yubico.com/products/yubikey‐hardware/).Asmallgroupofemployees(minimumofthree,preferablyfive)orothertrustedindividualscreatesanAttestingGroup.EachmemberoftheAttestingGroup(anAttestor)hasasmartcardthattheyusetosignAttestationsofAuthority.

25

OnceathresholdofAttestors(usually50%)haveattestedthataparticularpersonhasauthoritytopublishofficialdocuments,thatpersonisaPublisher(aspartofaPublishingEntity)andcansignnewreleasesofaLibrary.IfaPublisherleavestheorganizationorlosestheirkey,theAttestorsattestthattheoldkeynolongerhasauthoritytopublish.IfanAttestorleavestheorganizationorlosesakey,amajorityoftheremainingAttestorscanattestthattheoldkeyisnolongervalidandcanalsoattestthatanewkeyisavalidattestationkey.

Normally,cryptographicsignaturesareverycomplicatedorverybrittle.Thissystem,however,ensuresthatthesystemcontinuestoworkevenifseveralkeyshavebeenlostorcompromised.Moreover,encryptionkeysarestoredonphysicaldevicesandprotectedbyapassword.Evenifajurisdiction’snetworkiscompromised,theirkeysarenot.

AuthenticatingaLibrary

AttestationsofaPublisher’sauthorityarestoredintheLibrary.Thus,whenaPublishersignsaLibrary,alltheinformationneededforauthenticationisavailablewithintheLibrary.ThistechniquecombinedwiththeuseofGittocreateacryptographicallysecuredhistoryandtocreateeasilyreplicablerepositoriesresultsinrobustauthenticationsystemforLibraries.

WhileaConsumerorHostingEntitycanconfirmthatallsignaturesandallattestationsarevalidbacktotheveryfirstreleaseofaLibrary,theywillalwaysrequireatleastoneout‐of‐bandauthentication(i.e.,authenticationviasomethingotherthantheoriginalreceivingchannel)toconfirmtheveryfirstrelease.ThedesignoftheOpenLawPlatformaimstodecreasethefrictionrequiredtoobtainout‐of‐bandauthentication.

Forstarters,onceaConsumerorHostingEntityhasperformedoneout‐of‐bandauthentication,usuallyviaatelephonecalltothePublishingEntity,theuseofGittostoreaLibrarymeansanyfutureupdatescanbeconfirmedauthenticwithoutexternalverification.Justaslawlibrariescurrentlyprovideindirectauthenticationofpaperlaws—theybuythelawsfromtheofficialpublisherthenrepresenttotheirusersthatthesearetheofficiallaws—lawlibrariescandownloadaLibraryfromtheofficialPublishingEntity,performthesingleout‐of‐bandauthentication,andthenrepresenttotheirpatronsthattheseareofficiallaws.

OnceaLibraryishostedbymorethanoneHostingEntity,itbecomespossibletoperformout‐of‐bandauthenticationbycomparingthevarioushostedLibraries.AndthiscomparisoncanthenbeautomatedforeaseofusebyConsumers.

Importantly,thissystemworkswithoutrelyingonapublicrootcertificate(likethoseunderlyingHTTPS)orawebservicemaintainedbythePublishingEntity.Ifthewebservicegoesdown,orthePublishingEntitystopssupportingthewebservice,theLibrarywillstillbefullyavailableandauthenticatablethroughtheconstellationofHostingEntities.Inrootcertificatebasedsystems,compromisingtherootcertificatemeanscompromisingallhistoricaldocumentssignedbythecertificate.Whileitmayseemunlikelythatarootcertwillbecompromised,thisissurprisinglycommon.Symantec,untilrecentlyoneofthemosttrustedrootcertificateauthorities,wasforcedbyGoogleandMozillatodivestitselfofitsrootcertificatein2017becauseofmajorsystemicsecurityviolations.Anauthenticationsystempremisedonapublicrootcertificatesystemistoofragiletoprovideauthenticationoverdecades.Instead,byintimatelytyingtheauthenticationmechanismtothepreservationmechanism,preservingthedocumentsautomaticallypreservestheauthentication.

ThediscussionuptothispointhasbeenregardingArchivalAuthentication,i.e.,downloadingandauthenticatinganentireLibrary(alongwithallhistoricalversions).Mostusers,suchaslawyersandjudicialstaffwillbeperformingTransientAuthenticationofparticularversionsofindividualDocuments.Forthesepurposes,aweb‐basedauthenticationserviceisideal,asitmakesittrivialforuserstoauthenticate.TheOpenLawPlatformisdesignedtoprovideanauthenticationservicethroughawebsite,anapplicationprogramminginterface(API),andpluginsforallmajorbrowsers.TheauthenticationservicewillcompareahashoftheDocumenttobeauthenticatedagainstthehashesofallversionsofallauthenticDocuments.TheauthenticationservicecanthereforenotonlytelltheuserifaDocumentisauthentic,butalsowhentheversioninquestionwascreatedandif/whenitwassupersededbya

26

newerversion.Unlikeotherwebauthenticationservices,theOpenLawPlatformoptionallyprovidesthefullcryptographicauditchainsoanindividualcanconfirmforthemselvesagainstafullcopyoftheLibrarythattheDocumentinquestionisauthentic.

Redundancy

RedundancyisbuiltintothesystembecauseofthewayrepositoriesarestoredusingGitandbecauseoftheauthenticationprocess.

Withrespecttoredundancyofinformation,thewideadoptionofGitandthevariouscommerciallyavailableGithostingsolutionsmeansthatanyoneatanytimecaneasilyretrieveandhosttheirowncopyofaLibrary.ThisreplicabilitymeansthatLibrariescanbequicklydistributedacrosslargegeographicareasandcanhelprecoverfromdataloss.Moreover,eachcopyofaLibraryiscryptographicallysignedinawaythatpermitsforcorruptiondetection.

Nolessimportantandconsiderablymorecomplexistheredundancyofauthentication.IfmanyHostingEntitiesareconstantlypullingdownupdatesoffullyauthenticatedlaws,theconstellationofentitiescanhelpaPublishingEntityrecoverfromcatastrophiclosses(suchasanaturaldisaster).IfallAttesterkeysarelostin,say,aflood,agroupofHostingEntitiescanrepresentthatanewsetofAttesterkeysareofficialkeys,helpingtorapidlybootstrapaPublishingEntitybacktoanauthenticatablestate.Moreover,thepresenceofverifiablyauthenticcopiesheldbyHostingEntitiesmeansthatanynewcopiescanbeauthenticatedagainstthosecopieseveniftheoriginalPublishingentitynolongerexists.

Overall Assessment TheinitialcostofdevelopingtheOpenLawPlatformwassignificant,butitisnowafullygeneralizedlegalpublishingplatformthatisavailableforanyjurisdictiontouse.FreeGitrepositoryhostingisavailablefromseveralwell‐establishedcommercialprovidersincludingGitHub,Bitbucket,andGitLab.AsofFebruary2018,version1.0oftheOpenLawPlatformiscompleteandrunningfortheDistrictofColumbia.DocumentspublishedusingtheOpenLawPlatformcanbefoundathttps://code.dccouncil.us,andXMLrepresentationsareavailableathttps://github.com/dccouncil/dc‐law‐xml.InitialworkonArchivalAuthenticationiscompleteandisbeingrolledouttotheCouncil;TransientAuthenticationisexpectedJune2018.

27

Appendix I: Survey Results Theseresultsarecompiledfromresponsesfrom20states

Whatlegalmaterialsdoesyourstatepublishonlineonly?

Whatlegalmaterialsdoesyourstatepublishbothinprintandonline?

28

Ofthematerialsidentifiedinquestions#1or#2,whichonlinematerialsaredeemedtobetheofficialversion?

Haveyoudigitizedanypaper‐basedlegalmaterials?

Yes‐68%;No‐32%

Ifso,doyouintendthedigitizedmaterialstobeconsideredofficial?

Yes‐19%;No‐62%;N/A‐19%

Doyouplantodigitizepaper‐basedlegalmaterialswithinthenext18months?

Yes‐29%;No‐67%;Maybe‐4%

29

Whatdigitizationprocessesareyouusing?

Forlegalmaterialsthatyouaredigitizing,whatisthefileformatyouareusing(e.g.pdf,xml,doc,tif,jpg,jp2000)?

Doyouintendtoimplementalong‐termpreservationstrategywithinthenext18months?

Yes‐50%;No‐32%;Maybe‐18%

30

Ifyoudonothavealong‐termpreservationstrategy,whatarethebarriersthatyouface?

Whatsortofresourceshasyourstateprovidedforlong‐termpreservation?

31

Howlikelywouldyoubetouseanopensource,out‐of‐the‐boxstrategyforpublicationandpreservationofelectroniclegalmaterials?

1=Verylikely

5=Unlikely

Issues:

Unlikelytoengageinpreservationofanykind Notsurewhatthiswouldbe Publictrust Alreadydevelopingownstrategy Supportandmaintenance Stayingwithprint

Howlikelywouldyoubetoparticipateinaninterstatedigitalstoragesolutionforofficialelectronicdocumentsifoneexisted?

1=Verylikely

5=Unlikely

Issues:

Unlikelytoengageinpreservationofanykind Resources Accessibility;Security Constitutionalmandates Stateleveloperation Self‐sufficient

32

Appendix II:  Open source and commercial preservation systems Archive‐It (www.archive‐it.org)  Archive‐Itisawebarchivingservicethathasbeenavailablesince2006fromtheInternetArchive.ThisisthesameorganizationthatisresponsiblefortheWaybackMachine(https://archive.org/web),whichhasbeenarchivingtheinternetsince1996.Archive‐ItusestheHeritrixwebcrawlerthatwasdevelopedbytheInternetArchive,andoutputsdataintheWARCfileformat,anISOstandardforwebarchiving.

Archive‐Itisasubscriptionservice,andisusedtoarchiveboththewebsitesofpartnerinstitutionsaswellastopicalcollectionsofwebsites.TheArchive‐ItTeamattheInternetArchivehasdevelopedalife‐cyclemodeltohelpguideinthedecision‐makingneededforawebarchivingprogram.(http://ait.blog.archive.org/files/2014/04/archiveit_life_cycle_model.pdf)

Features:

Controltheextent,depth,anddescriptionofcollections BrowsecollectionsbyURL,bymetadata,andbyfull‐textsearch. Publicaccessviaarchive‐it.organdtoolstobuildcustomintegratedportals Filestorageinpreservationformatinmultiple,redundantdatacenters. AbilitytodownloadWARCfileson‐demandforlocalmanagement.

 

 

   

33

Archivematica (https://wiki.archivematica.org) Archivematicaisanopen‐sourcepreservationapplicationsupportedbyArtefactualSystems(https://www.artefactual.com).Thecommunityissupportedusingadiscussionlist,usergroupmeetings,trainingandworkshops,andinstallationandserviceagreements,andArtefactualSystemshaspartnershipsthatprovideArchivematicaasahostedservice.Theevolutionofthesoftwarecanbetrackedonthedevelopmentroadmap.

Archivematicautilizesasetofmicro‐servicestoperformfundamentalpreservationactions.Thesystemcomprisesstandardandopentoolsthatotherservicesalsoutilize(foracurrentlist,seehttps://wiki.archivematica.org/Release_1.6.0),aswellasspecificpythonmiddlewarethattiesthetoolstogetherintoservicesandworkflows.Alistofthemicro‐servicesisavailableathttps://wiki.archivematica.org/Micro‐services#Archivematica_Micro‐services.

FromtheArchivematicaliterature:

“ThegoaloftheArchivematicaprojectistogivearchivistsandlibrarianswithlimitedtechnicalandfinancialcapacitythetools,methodologyandconfidencetobeginpreservingdigitalinformationtoday.TheprojecthasconductedathoroughOAISusecaseandprocessanalysistosynthesizethespecific,concretestepsthatmustbecarriedouttocomplywiththeOAISfunctionalmodelfromIngesttoAccess.Throughdeploymentexperiencesanduserfeedback,theprojecthasexpandedevenbeyondOAIStoaddressanalysisandarrangementoftransferreddigitalobjectsintoSIPsandallowforarchivalappraisalatmultipledecisionpoints.Whereverpossible,theserequirementsareassignedtosoftwaretoolswithintheArchivematicasystem.Ifitisnotpossibletoautomatethesestepsinthecurrentsystemiteration,theyareincorporatedanddocumentedintoamanualproceduretobecarriedoutbytheenduser.Thisensuresthattheentiresetofpreservationrequirementsisbeingcarriedout...Inshort,thesystemisconceptualizedasanintegratedwholeoftechnology,peopleandprocedures,notjustasetofsoftwaretools.ForinstitutionsthatwanttechnicalassistancetoinstallandcustomizeArchivematica,optionaltechnicalsupportservicesareprovidedbyArtefactualSystems.”

Arkivum (www.arkivum.com) ArkivumisaUK‐basedcompanythatoffersthreedistinctstoragesolutionsforenterpriserecords,datasets,andculturalheritageassetsrespectively.Arkivum’sdigitalassetmanagementandpreservationsystemiscalledPerpetua.Perpetuacanbeoperatedasacloud‐based,managedstoragestoragesolutionorasalocallymaintained,internalsystem.ItusesArchivematica’stoolsformetadatacreation,performsscheduleddataintegrityaudits,offersdataencryptionandaccesscontrols,andsupportsavarietyofbackupstorageoptionsincludingtape‐,cloud‐,anddisc‐basedsystems,allgeographicallydistributedbetweentheUnitedStatesandtheBritishIsles.Toalleviateuncertaintyarounditshostedpreservationmodel,Arkivumofferscontractualcommitmentstoservicethatexceed25yearsinduration.Theyalsobuildintoallserviceagreementsatransparentexitstrategyshouldinstitutionschoosetomigratetoanewsystem.Arkivumisstillarelativelynewcompany,havingjusttransitionedfromaninternalprojectattheUniversityofSouthamptontoaprivatebusinessmodelin2011.

 

Duraspace Systems (www.duraspace.org) Duraspaceisanon‐profitorganizationdevotedtoprovidinglong‐termsupportforDspace,Fedora,andVIVO.DuraspacealsosuppliesdigitalassetmanagementandpreservationservicesthroughitsDuraCloudandDspaceDirectplatforms:

DuraCloud (www.duracloud.org) DuraCloudisacloudstorageandcontentpreservationserviceofferedbyDuraspacethatbacksupassetstomultiplecloudstorageproviderswhilealsoofferingasuiteofpreservationtoolssuchasdataintegritychecks,transfertools,andscheduledsynchronization.DuraCloudismainlyastoragesolution

34

andmustbeintegratedwithanassetmanagementsystemlikeDspaceorArchive‐Itforthecaptureofpreservationmetadata.Clientsaregivenarangeofcloudstorageproviderstostoretheirassetswith.TheseincludeAmazonSimpleStorageService,AmazonGlacier,SanDiegoSupercomputerCenter,RackspaceCloudFiles,andChronopolis.PricingforDuraCloudvariesdependinguponwhichcloudservicestheclientdecidestouse.Costsincludeanannualsubscriptionfeebetween$1,235and$5,520andaperterabytestoragecostbetween$500/TBto$825/TB.Chronopolisstorageincludesanadditionalingestfeeof$310/TB.

DspaceDirect (www.dspacedirect.org) DspaceDirectisahostedDAMSservicewhereinaclientcancontractDuraspacetomaintainacloud‐basedinstanceofDspaceforafee.SincetheDspaceDirectsystemintegrateswithDuracloud,itispossibletouseaDspaceDirectrepositoryasapreservationsystem.DuracloudisanotherDuraspaceservicethatofferscloud‐basedarchivingandpreservationfunctionssuchaschecksums,fileredundancy,contentmigration,andaccesscontrol.PricingforDspaceDirectcanvarygreatlydependinguponextentofstorageneeded.Arelativelysmall250GBallotmentofstoragecosts,asofSeptember2017,is$8,670.

Fedora (fedorarepository.org) NottobeconfusedwithRedHat’sLinuxoperatingsystembythesamename,FEDORA,whichstandsforFlexibleExtensibleDigitalObjectRepositoryArchitecture,wasdevelopedbytheDigitalLibraryResearchGroupatCornellUniversityinthe1990s.Itisanopensourcerepositorysystem,whichofferstoolsformanagementanddisseminationofdigitalassets.Itisnotableforitsflexibilityandmodularity.Itcanbeconfiguredtoacceptanyfiletypeormetadataschema.Additionally,itsfeaturescanbecontrolledviaanextensivesetofAPIs,allowingforintegrationwithexternalapplicationsanddevices.Becauseofthismodularitypotential,Fedoraitselfoperatesasakindofskeletonplatform.Itoffersseveralcorefunctionslikestorage,relationalconnections,andbasicingest,butmostadvancedfunctionsthatonecomestoassociatewithatypicaldigitalassetmanagementsystem(DAMS)areintegratedintothesystemasadd‐ons.IslandoraandSamvera(seebelow)aretwosuchsuitesofsoftwaretoolsthatcanbeaddedontoFedoratofacilitatedigitalassetmanagementandpreservation.

Islandora (islandora.ca) OriginallydevelopedbyaffiliatesoftheUniversityofPrinceEdwardIsland,IslandoraisaDrupal‐basedframeworkofdigitalrepositorytoolsthatintegratewithFedora.Islandoraisanopensourceplatformwhosecore,components,anddocumentationaremaintainedbyagrowingcommunityofcontributorsfromaroundtheworld.Islandora’smaincomponentsincludeDrupalwebinterfacesforadministrativefunctionsandend‐userexperience,Solrsearchengineforassetdiscovery,andspecialcontentmodelsfordifferentassettypeslikepdfs,videofiles,largeimagefiles,etc.SpecificfeaturesareaddedtoIslandorausingDrupal’smoduleandthemesystems.TheIslandoracommunityhascreatedanumberofmodulesthatcarryoutpreservation‐relatedoperations.Someoftheseinclude:

IslandoraPathauto/IslandoraHandle/IslandoraDOIforimplementingpersistentURLs. IslandoraPREMISforsupportingproductionandstorageofpreservationmetadata. IslandoraFITS/IslandoraChecksum/IslandoraChecksumCheckerforcarringoutdataintegrityfunctions

likechecksumgeneration,fileformatidentification,andtechnicalmetadataextraction. IslandoraBagItfordepositingbackupassetstoaBagItpreservationarchive. IslandoraVaultforfordepositorybackupassetstoCloudSyncorDuraCloud. IslandoraLOCKSS‐O‐MaticfordepositingassetsintoaPrivateLOCKSSNetwork.

35

Perma.cc (https://perma.cc) Perma.ccisatoolbuiltbyHarvardUniversity’sLibraryInnovationLabtospecificallycombatlinkrotincitations.ItisanonlineservicethatwillarchivethewebpageforagivenURLandaddittothePerma.cccollectionandreturnauniqueURL(e.g.“perma.cc/ABCD‐1234”)thatpointstotherecordinthecollection.WhenthatURListhenusedinacitation,itwillgivereadersastableviewofthepageatthetimeitwasarchived(eveniftheoriginaldisappearsfromtheweb),aswellasalinktothepageasitcurrentlyexists.

Perma.ccisafreeservice,andanyonecancreateanaccount,butunlessassociatedwithavettedorganization,auserwillbelimitedtocreating10permalinkspermonth.OnceanorganizationhasjoinedPerma.cc,unlimiteduseraccountscanbecreatedandthoseuserscancreateunlimitedpermalinks,aslongasthelinksaresavedwithinthememberorganizationonPerma.cc.

Preservica (www.preservica.com)  PreservicaisaprivatedigitalpreservationcompanythatoperatesoutofBostonandOxford.Itoffersservicesacrossthedigitalassetlifecycle,notjustforlongtermpreservation.ThePreservicaplatformisavailableasafullyhosted,cloud‐basedserviceorasanon‐premise,locallyhosted,customizableinstallation.PreservicarepositoryadherestoOAISISO14721preservationstandards.Itofferstoolsformetadatacreationandharvesting,simpleingestworkflow,multiplestoragechoices,largefiletransfer,accesscontrol,andactivefileformatidentificationandmigration.

Rosetta (www.exlibrisgroup.com/category/RosettaOverview) RosettaisadigitalassetmanagementandpreservationsystemproducedbyExLibristhatoffersfulllifecyclesupportforanydigitalformat.ThoughRosettaisproprietarysoftware,itexposespartsofitsarchitecturetothird‐partyintegrationwithAPIs.AdministratorscanconnectRosettatoaseparatestoragedeviceordevices,ifdesired.Rosetta’sworkflowmodelsareconfigurable.Itgenerateschecksums,identifiesfileformatsandextractstechnicalmetadataatingest.TheRosettapreservationplanningmoduleenablesadministratorstoscheduledataintegrityandmigrationtasksasneeded.RosettausesthePREMISdatamodelforcollectingpreservationmetadata.Thesystem’sarchitectureisdividedbetweenanoperationalrepository,wherefunctionslikepublishinganddeliveryarecarriedout,andapermanentrepository,wherepreservationfunctionsandlongtermstoragetakeplace.

Samvera (samvera.org) FormerlyknownasHydra,SamveraissimilartoIslandorainthatitintegrateswithFedorarepositorysoftwaretoprovidesearchengineandinterfacelayers;however,ratherthanusingDrupaltofacilitatethis,SamverausesaRubyonRailsplugincalledBlacklight.TheBlacklightframeworkisawebplatformthatisspecificallydesignedfor

36

resourcediscoverywithSolrIndexes.Samverahasbeenadoptedbyanumberofmajordigitallibrariesasadigitalassetmanagementsystem.In2014,theHydrausergroupdecidedtopursueoptionsforaddingdigitalpreservationfunctionalitiestotheHydra/Samverastack.AsofSeptember2017,aSamveraDigitalPreservationInterestGroupthatisinvestigatingthematter.

 

37

Appendix III: Stand‐alone Preservation Tools Asidefrompreservationsystemsthatstrivetoaccomplishanend‐to‐endOAIS‐compliantpreservationworkflow,thereareanumberofopensourcetoolsthatwillperformdifferentanddiscreetpartsoftheprocess.Herearesomethatareinwidespreaduse:

BagItDevelopedbytheLibraryofCongressandtheCaliforniaDigitalLibrarytodefineastrategyfortransferringdigitalcontent.Itspecifiestheelementsandstructureofa“bag”thatincludesthefilesinastandardcontainer.Thetoolwillcreateamanifestofchecksumsofallofthedigitalfilesinthatcontaineraswellasmetadataaboutthepackage.Onreceiptofthepackage,thechecksumscanbevalidatedtomakesurethatnocorruptionoccurredduringthetransfer.

ExampleofaBagItbag

DataAccessionerAsimpletoolfortransferringfilesfromonemediatoanother.IttoowillcreatechecksumsandgivestheusertheoptionofcreatingDublinCoremetadata.

38

Exactly

Anothertransfertool–fromthewebsite:“Exactlyallowsrecipientstocreatecustomizedmetadatatemplatesforsenderstofilloutbeforesubmission.Exactlycansendemailnotificationswithtransferdataandmanifestswhenfileshavebeendeliveredtothearchive.”

FITSThistoolconsolidatestheuseofotheropensourcetoolsforthepurposeoffilecharacterization.Thereareanumberofindividualtoolsthatwillidentifyafileformatandoutputvariouspiecesoftechnicalinformationaboutthatfile.Becauseeachindividualtoolhasstrengthsandweaknesses,itisgoodpracticetorunfilesthroughmultiplecharacterizationtools,andFITSwilldothisinoneprocessandoutputaconsolidatedsetofdata.

FromtheFITSonlineUserManual

39

FixityWhileseveralofthetoolsmentionedabovewillcreatechecksumsthatactassomethinglikeadigitalfingerprintofafile,withouttheabilitytovalidatethatchecksumperiodically,itservesnogoodpurpose.Fixityisatoolthatallowsfortheschedulingofregularchecksumvalidations,andwillsendareportontheresults.

Foranexhaustivelistofavailabletoolsandsystemsfordigitalpreservation,gotothePOWRRToolGridv2

POWRR–PreservingDigitalObjectsWithRestrictedResources

Recommended