26
CLARIN LEGAL ISSUES COMMITTEE (CLIC) · WHITE PAPER SERIES VERSION 1.0 – FEBRUARY 2017 GUIDELINES FOR BUILDING LANGUAGE CORPORA UNDER GERMAN LAW Guidelines by the DFG Review Board on Linguistics

GUIDELINES FOR BUILDING LANGUAGE ORPORA UNDER … › download › pdf › foerderung › ... · The data that data protection / privacy laws apply to, so-called “personal data”,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: GUIDELINES FOR BUILDING LANGUAGE ORPORA UNDER … › download › pdf › foerderung › ... · The data that data protection / privacy laws apply to, so-called “personal data”,

CLARINLEGALISSUESCOMMITTEE(CLIC)·WHITEPAPERSERIESVERSION1.0–FEBRUARY2017

GUIDELINESFORBUILDINGLANGUAGECORPORAUNDERGERMANLAW

GuidelinesbytheDFGReviewBoardonLinguistics

Page 2: GUIDELINES FOR BUILDING LANGUAGE ORPORA UNDER … › download › pdf › foerderung › ... · The data that data protection / privacy laws apply to, so-called “personal data”,

2

ThisworkislicensedunderaCreativeCommonsAttribution4.0InternationalLicense.

Version 1.0, February 2017: This is an English-language translation of guidelinespublishedbytheGermanResearchFoundation(DeutscheForschungsgemeinschaft,DFG)inMarch2015,originallyavailableat:http://www.dfg.de/download/pdf/foerderung/grundlagen_dfg_foerderung/informationen_fachwissenschaften/geisteswissenschaften/standards_recht.pdf

TranslationfromtheGerman:ErikKetzan,JuliaWildgans,JohnWeitzmann

Thistranslationpreservestheoriginaltext,butoccasionallyaddsnotesvia[2016note],forinstanceforupdatedcitations.

DistributedbytheCLARINLegalIssuesCommittee(CLIC),http://clarin.eu

Page 3: GUIDELINES FOR BUILDING LANGUAGE ORPORA UNDER … › download › pdf › foerderung › ... · The data that data protection / privacy laws apply to, so-called “personal data”,

3

TABLEOFCONTENTSPreliminaryremarks4Introduction5PART1:Informationonlegalaspectsoftheuseofspokencorpora1.1.Dataprotection/privacyaspects6

1.1.1.Declarationofconsent8

1.1.2.Anonymization/pseudoanonymization101.2.Copyrightaspects12ReferencesforPart113Part2:Informationonlegalaspectsoftheuseofwrittencorpora2.1.Copyrightandrelatedrights14

2.1.1.Basics14

2.1.2.Copyrightexceptionsandtheirapplicationtowrittencorpora15

2.1.3.Adaptations(derivativeworks)andtransformations17

2.1.4.Collectionsanddatabaseworks19

2.1.5.Orphanworks19

2.1.6.Software202.2.Dataprotection/privacyaspects202.3.Bestpractices21

2.3.1.Recommendationsforbuildingcorpora21

2.3.2.Recommendationsformakingwrittencorporaavailable22

2.3.3.Recommendationsforcreatingandmakingownworksavailable:derivativeworksanddatabases 23

2.3.4.Recommendationsfortheuseofsoftwarewhencreatingderivativeworks 23

ReferencesforPart224

Page 4: GUIDELINES FOR BUILDING LANGUAGE ORPORA UNDER … › download › pdf › foerderung › ... · The data that data protection / privacy laws apply to, so-called “personal data”,

4

PreliminaryremarksThese recommendationswere developed at two roundtablemeetings of the GermanResearchFoundation(DFG)in2012and2013,whichtookplaceontheinitiativeoftheDFG review board on linguistics, coordinated by Arnulf Deppermann and MechthildHabermann in cooperationwithHelgaWeyerts-Schweda from theDFG.Within theseroundtable meetings, working groups were formed which wrote theserecommendations.TheroundtablemeetingdevotedtospokencorporahostedbyArnulfDeppermannandThomasSchmidt(IDSMannheim)tookplaceattheDFG’sofficeinBonnonNovember9,2012.Themembersoftheworkinggrouponlegalaspectsoftheuseandprovisionofspokencorpora(part1oftherecommendations)are:JörgBücker,ArnulfDeppermann,Sebastian Drude, Dagmar Jung, Paweł Kamocki, Erik Ketzan, Christoph Purschke,Angelika Redder, John H.Weitzmann and Thomas Schmidt (coordinator). Commentsfrom the DFG legal advisor Mrs. Hagena-Schmedding and the DFG data protectionofficerMr.Dörelwereconsideredindraftversions.The related roundtable meeting for written corpora hosted by Alexander Geyken(BBAWBerlin)andMarcKupietz(IDSMannheim)tookplaceattheDFGonNovember15, 2013. The members of the working group for formulating the informationconcerning legal aspects of theuse andprovisionofwritten corpora (part2 of theserecommendations) were: Gerhard Heyer, Christian Mair, Roland Schäfer and SilkeSchwandt.Inaddition,DagmarDeuber,RichardEckartdeCastilho,JudithEckle-KohlerandIrynaGurevychcontributedtotherecommendations.Furtherpartsofthetextwerewritten and edited by Pawel Kamocki, Erik Ketzan and John H. Weitzmann, whotogether performed a reviewof the legal scholarship, a process coordinatedbyMarcKupietz.

Page 5: GUIDELINES FOR BUILDING LANGUAGE ORPORA UNDER … › download › pdf › foerderung › ... · The data that data protection / privacy laws apply to, so-called “personal data”,

5

Introduction

Thepossibilitiesofre-useandarchivingofspokenandwrittencorporaareaffectedbypersonality rights (depending on legal tradition also called: the right of publicity),copyright law and data protection / privacy laws. These recommendations includeinformationaboutlegalaspectswhichshouldbeconsideredwhilecreatingcorporatoensure thegreatestarchivabilityandre-usabilitypossible in compliancewithcurrentlaws.The informationcompiledhereshallserveresearcherswhoplantocreatecorporaorwhoareinvolvedinevaluationofsuchmeasuresasaguideline.Thisinformationisnotexhaustiveortobeconsideredaslegaladvice.Researchersshouldconsultinstitutionallegaldepartmentsandmanagementbeforemakinglegallyrelevantdecisions.Thatsaid,furtherlegalexpertiseshouldbesoughtifpossibleasearlyasprojectplanningphases.

Page 6: GUIDELINES FOR BUILDING LANGUAGE ORPORA UNDER … › download › pdf › foerderung › ... · The data that data protection / privacy laws apply to, so-called “personal data”,

6

Part1:Informationonlegalaspectsoftheuseofspokencorpora1.1.Dataprotection/privacyaspectsThedatathatdataprotection/privacylawsapplyto,so-called“personaldata”,isdatathatreferstoliving1humans.Thisincludesdatathatistraceabletoindividualsorverysmallgroupsofpeople.For spoken corpora projects, the processing (i.e. collection, storage, modification,transmission,blocking,deletionandotheruses)requirestheconsentandcooperationof the people who are to be recorded.2In this situation the interest of the recordedpersontoprotecthis/herpersonaldataandtheinterestoftheresearchertobeabletousethisdatatothelargestextentpossiblecanruncontrarytoeachother.Theaimmustbetonegotiateandfindafeasible,legallycorrectandethicallyresponsiblecompromisebetweenthepotentiallyopposinginterests.Thissolutionshouldontheonehandgivefull effect to the recorded person’s right of data protection, on the other hand takeaccountoftheinterestsofthescientificcommunity,e.g.notprecludewaysofdatausethatdonotimpactonprivacy.● Dataprotectionregulationsforresponsibleauthoritieswhichhavetheirseatora

branch in Germany3can be found in EU law (Directive 95/46/EC andDirective2002/58/EC), national laws (Bundesdatenschutzgesetz BDSG) and federal statelaws(Landesdatenschutzgesetze,e.g.HamburgerDSG).TheBDSGappliestopublicbodies of the Federal Republic of Germany and non-public authorities, e.g.companies. For universities and other public bodies of the federal states, therespectivefederalstatedataprotectionlawapplies.4Theserecommendationsarefor researchers at the above-mentioned institutions. Thus it should beemphasizedthatpublicbodiesoftheFederalRepublicofGermanyandthefederalstates ofGermanymust observedifferent lawsof data protection (although the

1 Dataprotectionlawsapplytodeceasedindividualsonlytoaverylimitedextent.2Thisprimarilyappliestodatawhichwascollectedbytheresearcherhim/herself.IncasesofdatafromTV,radio,Internet,researchersareoftennotinthepositiontoaskforthepeople’sconsentandtheirwillingnesstocooperate.3WeconfineourselvestoinformationconcerningthenationwidelegislationthroughouttheFederalRepublicofGermany.DuetothefactthatthelegislationinotherEUmemberstatesfollowsthesameEUdirectives,theregulationstherearequitesimilar,withsomedifferences.Itshouldbealsotakenintoaccountthatthelegislationofsomefederalstatesmaycontainadditionalregulationsfordataprotectionwhicharenotmentionedhere.Thedataprotectionlegislationofnon-EU-countriesmaydifferconsiderablyfromthecasesdescribedbelow.Ifthecollectionofspokendataaffectslegislationofnon-EU-countries(e.g.ifdatafromabroadisrecorded),additionallegaladviceshouldcertainlybeobtained.4LandesdatenschutzgesetzBaden-Württemberg(LDSGBW),BayerischesDatenschutzgesetz(BayDSG),BerlinerDatenschutzgesetz(BlnDSG),BrandenburgischesDatenschutzgesetz(BbgDSG),BremischesDatenschutzgesetz(BremDSG),HamburgischesDatenschutzgesetz(HmbDSG),HessischesDatenschutzgesetz(HDSG),NiedersächsischesDatenschutzgesetz(NDSG),DatenschutzgesetzMecklenburg-Vorpommern(DSGM-V),DatenschutzgesetzNordrhein-Westfalen(DSGNRW),DatenschutzgesetzRheinland-Pfalz(DSGRLP),SaarländischesDatenschutzgesetz(SDSG),SächsischesDatenschutzgesetz(SächsDSG),DatenschutzgesetzSachsen-Anhalt(DSG-LSA),LandesdatenschutzgesetzSchleswig-Holstein(LDSGSH),ThüringerDatenschutzgesetz(ThürDSG).

Page 7: GUIDELINES FOR BUILDING LANGUAGE ORPORA UNDER … › download › pdf › foerderung › ... · The data that data protection / privacy laws apply to, so-called “personal data”,

7

contentof the laws is largely thesame,keydifferencesexist).5[2016note:FromMay25th2018onwardstheEUGeneralDataProtectionRegulationwillprovidethemainregulatoryframework.]

The aim of data protection laws including the General Regulation is to protectindividualsagainstviolationsof theirrightof informationalself-determinationduringthe processing of personal data through authorities and other bodies. TheBundesdatenschutzgesetz states: “The purpose of this Act is to protect the individualagainst his/her right to privacy being impaired through the handling of his/herpersonaldata.”(§1 IBDSG). “Personaldata”areunderstood inacomprehensivewayas,“anyinformationconcerningthepersonalormaterialcircumstancesofanidentifiedor identifiableindividual(thedatasubject)”(§3IBDSG).Asimilarlybroaddefinitionalsoappliestotheterm“handling[ofpersonaldata]”.Inthecaseofspokencorporaitincludesthecollection,storage,processingandpublicationofsuchdata(§3IVBDSG).EU law includes more requirements for the processing of personal data (Directive95/46/EC,Art.6,7etc.),notablythattheconsentofthepersonaffectedisneededfortheprocessingofpersonaldata(Directive95/46/ECArt.8I).The data privacy officer of the respective institution, the federal state or the FederalRepublic is responsible for the controlof theobservanceof thedataprotection laws.Researchers should thus resolve data protection issues of data management andprocessingwiththeirdataprivacyofficer.As far as spokencorporaare concerned,dataprotection regulationsapplyat least toaudioandvideorecordings,transcriptsandmetadataaboutspeakers.Themost important instruments formeeting thedataprotection requirementswhilehandlingandusingspokencorporaareprivacypoliciesofthe“responsiblebodies”whodeal with the data, relevant and informed declarations of consent and suitableanonymisationand/orpseudonymizationofdata.This is furtherdiscussed in thetwosubsectionsbelow.

5 Eachfederalstatelawondataprotectioncontainsachapterwhichisexplicitlydevotedtotheuseofdataforscientificpurposes,forscientificresearchorinresearchinstitutions.AlthoughotherpartsoftheLDSGsapplytoresearchers’workaswellofcourse,researchersandinstitutionsshouldpayspecialattentiontotheseparts:§35BWLDSG,Art.23BayDSG,§30BlnDSG,§28BbgDSG,§19BremDSG,§27HmbDSG,§33HDSG,§25NDSG,§34DSGM-V,§28NRW,§30DSGRLP,§30SDSG,§36SächsDSG,§27DSG-LSA,§22LDSGSHand§25ThürDSG.

Page 8: GUIDELINES FOR BUILDING LANGUAGE ORPORA UNDER … › download › pdf › foerderung › ... · The data that data protection / privacy laws apply to, so-called “personal data”,

8

1.1.1.DeclarationofconsentTheprocessingofpersonaldataisonlypermittedbylawiftheconsentofthepersonaffectedisobtained(Directive95/46/ECArt.30).● Before collecting personal data and recording spoken interactions, a written

declarationofconsentshouldbeobtainedfromeverypersoninvolved(so-called“informed consent”). The BDSG (which applies to public bodies of the FederalRepublicofGermanyandnon-publicbodies,seedetailsabove,[2016note:soontobe replaced to a large extent by the rules of the EU’s General Data ProtectionRegulation, EU GDPR]) states that the “consent shall [only] be effective whenbasedon thedata subject's freedecision.Data subjects shallbe informedof thepurposeofcollection,processingoruseand,insofarasthecircumstancesoftheindividual case dictate or upon request, of the consequences of withholdingconsent.Consent shall be given inwritingunless special circumstanceswarrantanyotherform.6Ifconsentistobegiventogetherwithotherwrittendeclarations,itshallbemadedistinguishableinitsappearance.”(BDSG§4aI)

● Some LDSGs (which apply to universities, as discussed above) permit theprocessing of personal data without consent for special research projects ifthere are certain qualifications, e.g. “protection-worthy interests of the personconcerned will not be affected because of the type of data, due to theirobviousness, or because of the nature of the use,” or “the public interest in theresearch projects outweighs the data subjects’ protection-worthy interests thatqualify forprotectionand theaimof the researchcannotbe reached inanotherway.” 7 In such cases, however, there might be additional obligations foranonymization of data and notification of the respectively responsibleLandesdatenschutzbeauftragten,seee.g.§27HmbDSG.

● If there isnotsuchaprivilegedcaseandadeclarationofconsent isneeded, theconsentingpersonneedstobeinformedaboutthefollowing:

o thenameoftheresearchprojecto contactdetailsofthepersonwhoisinchargeoftheprojecto aimsoftheresearchprojecto information about if and in which way personal data is collected,

processed,usedand forhow long theyarestored.Thereby, thewaysofprocessingandusingwhichmaydifferdependingonthefiletype(audio-,videorecordings,transcriptsetc.)shouldbeelaboratedifnecessary(seedetailsbelow).

6 Forexample,thiscouldbethecaseinstillexistingoralcommunities,especiallyifsignatures(byexperience)areassociatedwithnegativeconsequences.Insuchcasesanaudio-visualdocumentationoftheAufklärungcanbe“anyotherform”ofconsent.7§35Sect.1BWLDSG,§30Sect.1BlnDSG,§28Sect.1BbgDSG,§19Sect.1BremDSG,§27Sect.1HmbDSG,§33Sect.1HDSG,§25Sect.2NDSG,§30Sect.2SDSG,§22Sect.4LDSGSH.

Page 9: GUIDELINES FOR BUILDING LANGUAGE ORPORA UNDER … › download › pdf › foerderung › ... · The data that data protection / privacy laws apply to, so-called “personal data”,

9

o especiallywithininternationalresearchprojectsitneedstobeconsideredthat (ifprovidedorat leastnotprecluded) theprocessing of personaldataby thirdpartiesaswellastheirtransmission tobodiesoutsideEEAmustbementionedexplicitly.

● Quite often, so-called special categories of personal data are collected for

spokencorpora.TheGermanlegislatordefinesthemas“informationonaperson’sracial or ethnicorigin8,politicalopinions, religiousorphilosophical convictions,unionmembership,healthorsex life”(§3Sect.9BDSG,§5Sect.1S.2HmbDSGandsuperordinatedDirective95/46/ECArt.8Sect.1).Ifspokencorporaconsistofspecialcategoriesofpersonaldata,thedeclarationofconsentneedstorefertothesedataexplicitly(seee.v.§4aSect.3BDSG).

● The people affected must be thoroughly informed about planned ways ofprocessing and using the collected data before signing the declaration ofconsent.Itmaybeadvisabletoprovidetheinformationforthepeopleaffectedinadvance,usingawrittendataprotectiondeclarationwhich theymay internalizebystatingasimple“Yes,Iagreewiththat”.Inanycase,thedeclarationofconsentshould include language that refers to if and how information about dataprotectionwasgiven.

● In the case of minors and otherwise not legally responsible persons, thedeclarationof consentmustbe signedby their legalguardians.Minorsafter the6thyearoflifearegranteda“vetoright”againstthedeclarationofconsentgivenbytheirlegalguardians(whichshouldthereforeberequested).

● Ifthedeclarationofconsentistobegiventogetherwithotherdeclarations,itandthecorrespondingprivacypolicyshallbemadedistinguishableinitsappearance,e.g.fromotherTermsandConditions.

● Oraldeclarationsofconsentmustbeconfirmedinwriting.● Electronicdeclarationsofconsentmustbedocumented.

● A description of how and for whom the declaration of consent may be

revoked/retractedinthefutureshouldbeincluded.

● Ineverydataprotectionlawthereistheruleofdataeconomy:boththeextentofdatacollectionandtheplannedmannersofuseshouldbeaslittleaspossible,i.e.limited toaimsof researchandeducation.Awider limitation -e.g. theuse foraspecialresearchaimorbyalimitednumberofpeople-mayindeedprejudicethe

8 Especially within spoken data it needs to be considered that information about the language biography (e.g. information about the mother tongue, information about the dialect) often allows one to draw conclusions about ethnicity. Such information should be understood as “special categories of personal data” within the meaning of the law and should be addressed as such.

Page 10: GUIDELINES FOR BUILDING LANGUAGE ORPORA UNDER … › download › pdf › foerderung › ... · The data that data protection / privacy laws apply to, so-called “personal data”,

10

general scientific reuse, but should still be offered to people affected,accompaniedbyexplanationsaboutthenegativeconsequencesofsuchlimitationsfortheresearchinanon-technicalway.

● Ifarchiving and publishing dataduringorafter theprojectphase is intended,thispurposemustbestatedexplicitlyinthedeclarationofconsent.Thenatureofpublication (e.g. in adatabaseon the Internet) shouldbedescribed inaway sothatfuturechangesofthewayofpublication(e.g.becauseoftechnicalchangesinthearchivingandpublishingsystem)arecoveredbytheconsent.Apartfromthat,publishingorarchivingofa subsetof thecollecteddata shallbecoveredby theconsent.

● Itisacurrentpracticetorestrictthegroupofuserswhoshallhaveaccesstothedata.Inpractice,dataprotectioncanbeaccomplishedbyapasswordwhichisallottedonlyuponrequest.Oftenthismakes iteasier forpeopleaffectedtogivetheirconsent.

● Asacounterpartofthedeclarationofconsentdatausersshouldsignawrittenusedeclarationwhichbinds them to: first, use thedataonly for aims stated in thedeclaration of consent; second, blackenpersonal data in a publicationbased onthisdataasfaraspossible,and;third,notgivethisdatatothirdparties.

1.1.2.Anonymization/PseudoanonymizationTheBDSG(thatappliestopublicbodiesoftheFederalRepublicofGermanyandnon-public bodies, as discussed above) describes the necessity for anonymization andpseudoanonymization of personal data as follows: “Personal data is to be collected,processedandused,andprocessingsystemsaretobedesignedinaccordancewiththeaimofcollecting,processingandusingaslittlepersonaldataaspossible.Inparticular,personaldataistobealiasedorrenderedanonymousasfaraspossibleandtheeffortinvolvedisreasonableinrelationtothedesiredlevelofprotection.”(§3a).“Renderinganonymous” here means the modification of personal data so that the informationconcerning personal or material circumstances can no longer -- or only through adisproportionateamountoftime,expenseandlabour--beattributedtoanidentifiedoridentifiableindividual.”(§3Sect.6BDSG)“Pseudoanonymization”meanstoreplaceaperson’snameandother identifying characteristicswitha label, inorder toprecludeidentificationofthedatasubjectortorendersuchidentificationsubstantiallydifficult.”(§3Sect.6aBDSG).● All16LDSGs(thatapplytouniversities,seedetailsabove)laydownrulesforthe

case that datamay be processedwithout consent of people affected, privilegedbecauseofscientificityofaims.Dependingontheregulationineachfederalstate,thedatamustberenderedanonymousoraliased,oncetheresearchaimallowsit,

Page 11: GUIDELINES FOR BUILDING LANGUAGE ORPORA UNDER … › download › pdf › foerderung › ... · The data that data protection / privacy laws apply to, so-called “personal data”,

11

and if necessary, features that allow an de-anonymisation should be keptseparately(anddeletedassoonasthescientificaimallows).

● Some LDSGs contain regulations for cases in which anonymization andpseudoanonymizationisnotpossible.9

● If anonymization or pseudoanonymization of data is pledged before using or

publishing,itshouldbesetdowninwritinginthedeclarationofconsent.

● Different kinds of data typeswill requiredifferentmethods to anonymize orpseudoanonymize:

● Withmetadataandtranscripts,anappropriatelevelofreplacinginformation(i.e.pseudoanonymization)mayusuallybeachievedbyreplacingthenamesofpeople,geographicallocations,etc.,sothatspeakersmaynotbeidentified,oridentifiedonlythroughdisproportionateeffort.

● WithinaudiodatatheidentificationofspeakersmaybehamperedbyVerrauschenorfadingofpartsinwhichnamesarestated.Howeverwithoutanyfurtherprocessing(e.g.alienationoftheaudiosignal)theopportunitytoidentifythespeakersbytheirvoicesstillexists.

● Withinvideodataanextensiverenderinganonymousofimagedata(e.g.by

pixelizingfacesorcuttinginblackmattes)thiswouldmeanthattheresultingvideoisre-usableonlytoaverylimitedextent.

While signing the declarations of consent, affected people must be as informed aspossibleaboutwhichwaysofanonymizationandpseudoanonymizationapply for thecollected data while ensuring reusability. If necessary, the provision of datamay berestricteddependingonthetypeandtheabilitytorenderanonymous(e.g.anonymisedaudiodataavailableforagreater,relatedvideodataonlyforastrictlylimitedgroupofusers).Eventhisneedstoberegardedadequatelywithinthedeclarationofconsent.

9§34Sect.2DSGM-V,§28Sect.2NRW,§22Sect.3LDSGSH.

Page 12: GUIDELINES FOR BUILDING LANGUAGE ORPORA UNDER … › download › pdf › foerderung › ... · The data that data protection / privacy laws apply to, so-called “personal data”,

12

1.2.CopyrightaspectsForspokencorpora,bothcopyrightandrelatedrightsmaybeissues,especiallywhenitcomesto:● audio and video recordings from radio and TV broadcasts, where authors,

producers,broadcastingcompanies,andothersowncertainrights● audio and video recordings from the Internet (streaming platforms and other

sources)wheretheoperatorsoftheplatformmayownrights● writtenmaterialthatbelongstoaspokencorpusassupplementarymaterial(e.g.

powerpointslidesforaspeech,coursebooksfortheclass,etc.)and● pictures,graphicsetc.

Assoonasthesematerialsareused inthecourseofresearch, theconsentofrelevantrightholders is necessary to perform the research legally. A general research andeducation law regarding copyright andother rightshasnot yetbeen implemented inEurope,althoughsucharegulationhasbeen,andis,continuouslydiscussed.Currently,onlythequotationexception(§51oftheGermanActonCopyrightandRelatedRights(UrhG)) and some special regulations for building personal scientific archives allowverylimiteduseofsomeoneelse’sworkatall.The consent of the rightholders is usually given through an appropriate licenseagreement(orcontract).Inpractice,itisaconsiderableproblemwhenrightholdersarenotknownorcannotbefound.Thisisimportantbecauseeveryrightholdermustgivehis/herconsentbeforeauseoftheworkwhichisotherwiseonlypermittedforrightholders is allowed (with the exception of films, and if there are no other specialagreements). If more than one person created the work, the consent of each co-rightsholdermustbeobtained.Thisalsoreferstotranscriptsofprimarydataprotectedbycopyrightlaw(e.g.spokenandsongrecordings),evenifthetranscriptistechnicallytheworkofthescientistinthesenseofcopyright.Insuchcasesthistranscriptisconsideredasimplecopyoftheworkwhich is included in theprimarydata or a derivativework (e.g. translation). Both ofthesetypesofuseareassignedtotheoriginalrightholder(exceptinabove-mentionedcopyrightexceptions,e.g.thequotationexception).Extraprecautionisappropriateifthecopyright-protectablematerialhasnotyetbeen,butwillbepublishedwithinascientificworkinamannerwhichcannotbeavoideddueto the best scientific practice in disclosing sources. This affects the authors’ right ofpersonalitybecauseitistheirchoicewhethertheirworksaredisclosedtothepublicornot.

Page 13: GUIDELINES FOR BUILDING LANGUAGE ORPORA UNDER … › download › pdf › foerderung › ... · The data that data protection / privacy laws apply to, so-called “personal data”,

13

ReferencesforPart1Enke,Harry,NormanFiedler,ThomasFischer,TimoGnadt,ErikKetzan, JensLudwig,Torsten Rathmann, Gabriel Stöckle, and Florian Schintke (2013). Leitfaden zumForschungsdaten-Management.VerlagWernerHülsbusch.Häder, Michael (2009): Der Datenschutz in den Sozialwissenschaften: AnmerkungenzurPraxissozialwissenschaftlicherErhebungenundDatenverarbeitunginDeutschland,[formerly]availableat:http://www.ratswd.de/download/RatSWD_WP_2009/RatSWD_WP_90.pdfMetschke,Rainer/Wellbrock,Rita(2002):DatenschutzinWissenschaftundForschung.Materi-alienzumDatenschutzNr.28,[formerly]availableat:http://www.datenschutz.hessen.de/down-load.php?download_ID=147

Page 14: GUIDELINES FOR BUILDING LANGUAGE ORPORA UNDER … › download › pdf › foerderung › ... · The data that data protection / privacy laws apply to, so-called “personal data”,

14

Part2:Informationonlegalaspectsoftheuseofwrittencorpora2.1.Copyrightandrelatedrights2.1.1.BasicsIngeneral, textsareprotectedbycopyright inGermany10if theysatisfyanoriginalitystandard and it has not been more than 70 years since the death of their authors.Howtheoriginalitystandardisdefined,andthereforehowitismet,isacontroversialquestionanditsanswermaydifferfromcasetocaseandfromcourtdecisiontocourtdecision. The requirements for meeting the originality standard for copyrightprotectionhavebeensetlowerandlowerbycourtsoverthepastdecades.Textssuchas simple statements of the news or plain business correspondencemay still not beprotectedbycopyrightbecausetheydonotmeettheoriginalitystandard.Butthereistheconceptof“kleineMünze”,asortofeverydaycreativityofpeopleingeneralwhichisfullyprotectedbycopyright.Therearealsocertainrelatedrightsthatareespeciallyrelevantfortexts:Since 2013, there is a related right for publishers in Germany which sidesteps theoriginalitystandard,whichgrantsprotectiontoeventheshortestparagraphsforatermof one year. This protection follows mere publication, and is limited to the right ofmakingpubliclyavailable.Itis,therefore,onlyinvokedwhenthepresscontentisplacedonline.Therearetworelatedrightsthathaveawiderprotecteddomainbutasmallerscopeofapplication. These include scientific editions of works that are not protected bycopyright, andone concerningposthumousworks, i.e.works that arepublishedafterthedeathoftheirauthorsandasthecasemaybeafterthecopyrightterm(70years,seeabove). These rights protect all uses of theseworks (not only for online use) for 25years.Finally,thereistherelatedrightforthecreatorsofdatabases.Thistermis15yearsandisnotrelatedtothecontentsofdatabases,buttothemannerinwhichitisstructured.This related right does not apply to unstructured data and requires substantial

10WeonlygiveinformationaboutthelegislationintheFederalRepublicofGermany.HowworksofGermanauthorsareprotectedinothercountriesandhowforeignauthorsareprotectedinGermanyisregulatedinsomeinternationalconventions.ThemostimportantonesaretheBerneConventionfortheProtectionofLiteraryandArtisticWorks(usuallyknownastheBerneConvention)andtheAgreementonTrade-RelatedAspectsofIntellectualPropertyRights(TRIPS).InArt.5.1.theBerneConventionsaysthateverystatepartymustacknowledgetheprotectionofworksofcitizensofotherstatepartiesasitacknowledgestheprotectionofworksofitsowncitizens.Thereare168countriesthatarepartiestotheBerneConvention(i.a.theEU,theUSA,China,Japan,RussiaandIndia).

Page 15: GUIDELINES FOR BUILDING LANGUAGE ORPORA UNDER … › download › pdf › foerderung › ... · The data that data protection / privacy laws apply to, so-called “personal data”,

15

investments of time and/or money. Right holders (i.e. those who have created adatabasethroughsubstantialinvestment)areprotectedfrom“substantialparts”ofthedatabasebeingreproducedorusedfurther.Related rights are distinguished from copyright especially in two key ways. First,relatedrightshaveashortertermofprotection.Second,relatedrightscanprotecttheworks createdby a legal person, e.g. a company.Under copyright, companiesmay atmost have exclusive rights in copyright-protectedworks,while authorsmay only benaturalpersons.Therulesofcopyrightlawthataremostrelevantforwrittencorporaaffecttherightofreproduction(§16),therightofdistribution(§17),therightofmakingworksavailableto thepublic (§19a), the related rights on scientific editions (§70) andposthumousworks(§71),therelatedrightofmakersofadatabase(§87b)andtherelatedrightofpress publishers (§ 87f). It is still a legal gray area whether Text and Data Mining(TDM)11and thus quantitative linguistic analysis are types of use with copyrightimplications which are not yet mentioned in § 15 UrhG but protected nonetheless.(Morespecifically,whether theactofperforminganalysisonthedata fallswithin thescopeof§15UrhG;theresultingdigitalcopyundoubtedlyfallsunder§16UrhG.)Courtdecisions clarifying this issues can perhaps be expected in the foreseeable future.BecausethereareclearparallelsbetweenTDMandahumanreadingatext,whichisnota typeofuse relevant for copyright, it is easily conceivable that courtsmay rule thatTDM is permitted by law even without permission of the right holder, similar toreading.2.1.2.CopyrightexceptionsandtheirapplicationtowrittencorporaLawsthatbalanceofinterestsofauthorsandusersareso-calledcopyrightexceptions.These determine which types of uses are allowed without the consent of the rightholders, and underwhich circumstances. The use of copyright protectedmaterial asresearch data is only broadly provided for. The so-called research exception (§ 52aUrhG) forexampleallowsmakingavailable “small scale”worksaswell as “individualarticlesfromnewspapersorperiodicals,”andonlyifandinsofarthisis“necessary”forthe respective researchpurpose and is “justified” for the “pursuit of non-commercialaims”.Thecopiesmaybemadeavailable“exclusivelyforaspecificallylimitedcircleofpersons”whichmayincludeasmallresearchteamwhosemembers--accordingtothelegalcommentatorsDreier/Schulze(2013)--maybeofdifferentresearchinstitutions,oraseminar,butnotthewholescientificcommunity.Thelimitedcirclemustbelimited

11WeadoptthetermTextandDataMiningbecauseitisnowfrequentlyusedindiscussionsbytheinternationallegalcommunity.Atthemoment,thereisnocoherentsystemofdefinitionsofthedifferenttermswhichareusedforscientificanalysisofdata,butmanyslightlydifferentandpartlyoverlappingnomenclatures.ItcanbearguedthatthemeaningofTDMinanycaseincludesquantitativelinguisticanalysis.

Page 16: GUIDELINES FOR BUILDING LANGUAGE ORPORA UNDER … › download › pdf › foerderung › ... · The data that data protection / privacy laws apply to, so-called “personal data”,

16

topeoplewhoaccessthematerialsfortheirownscientificpurposes12andthemeasurestakenmustbeeffectiveconsideringthestateoftheartatthetime.Therightoftemporaryactsofreproduction(§44aUrhG)allowsatemporarycachingofelectronic data, although this right is often insufficient to legally cover the empiricalmethodsandreplicableresultsrequiredbyscientificresearch.Thesamecanbesaidforthe right of reproductions for private use, which are permitted by § 53 I UrhG, butallows a transfer only in private, i.e. not in work-related scientific field, and § 53 IIUrhG, which allows a reproduction only for one's own personal scientific use (thepossibilities of transfer are regulated in § 52a). The right of digital reproductions ofcomplete books or magazines is further limited in § 53 IV UrhG. Concerning all theexceptions, one must keep in mind that they are subordinate to contrary licenseagreements.Additionally,§52aIVUrhGstatesthatanequitableremunerationshallbepaid(guidedbyratessetoutintheVGWORTcase).13Attentionshouldalsobepaidtothefactthatexceptionsofcopyrightprotectiondonotapply for related rights in the same way. The have their own respective protectionexceptionsthatarenamedintherespectivepartoftheUrhG.Asaconclusionitcanbesaidthatlegalexceptionsaretypicallynotasufficientbasisformakingavailablewrittencorporapermanently.Makingavailableacopyofthewrittencorpusthathasbeentheresearchobjectisnotcoveredbyanyoftheabovementionedresearchexceptionswhatmaycomplicatetherepeatabilityandthustheverificationofrespectiveresearchprojectsmassively.Oftenenoughevenbuildingupacorpusoftextsfor which no express permission was given, is unlawful because the digital copiesproducedintheprocessarenotnecessarilycoveredbycopyrightexceptions.Forbuildingacorpusinconformitywiththelaw,theconsentoftherightholdersmustbeobtained,oritmustbeensuredthatonlytextsareused:● thatarenotprotectedbycopyright,suchasthetextoflaws,certaingovernment

documents,etc.● wherethetermofcopyrightprotectionhasexpired,or● wherethetextsdonotmeettheoriginalitystandard.14

A thorough checking/clarification of rights is therefore necessary. The costs for thismaypossiblybereducedbycooperatingwithothercenters that that seek touse thisdataandthereforecheckitslegalstatus.

12BT-Drucksache15/38,S.213InitsdecisionofMarch24,2011(filereference6WG12/09)concerningthecaseVGWort-FederalStates,thehigherregionalcourt(OLG)ofMunichconsideredaremunerationof10euros+VATperworkasequitableforscientificresearchwithinthescopeof§52aINr.2UrhG.14Whethertheoriginalitystandardappliestoasuccessionofrandomlyassortedsentencesisunclear.§39UrhG“Alterationsofthework,”whichbelongstothemoralrights,isoneargumentthatthismethodisnotlegallysound.

Page 17: GUIDELINES FOR BUILDING LANGUAGE ORPORA UNDER … › download › pdf › foerderung › ... · The data that data protection / privacy laws apply to, so-called “personal data”,

17

Thesituationtendstobeconsiderablyeasier,iftheintendeduseiscoveredbystandardlicenses which grant the necessary rights for the use in a corpus, to everyone, inadvance. These are called “Public Licenses”. In best-case scenarios, the author hasalreadypublishedhis/hertextsunderasufficientlyliberalstandardlicense.Butoftenthisisnotthecase.Thismeansthatindividuallicenseagreementswiththerespectiveright holdersmust bemade,which requires time andother resources. In the case oftexts published by presses / publishing houses, these may typically be contacteddirectly,becausethepublisheroftenobtainstherighttolicenseelectronicusesintheircontractswith authors. The same often applies to textswhich are published onwebportals,becauseoperatorsareoftengrantedtherespectiverightsthrough“TermsandConditions”agreements.2.1.3.Adaptations(derivativeworks)andtransformationsAdaptationsinthemeaningofthelawarecontentsthatarebasedonapreviousworkandmeet the originality standards to qualify for protection (the law of copyright inadaptations), even if the previous work is not longer protected by copyright. If thepreviousworkisstillprotectedbycopyright,adaptationsmayonlybepublishedwiththe consent of the author of the previous work. Transformations are, according toprevailing legal opinion,modified versions of previous works that do not meet therequirements of protection for copyright in adaptations. They alsomay also only bepublishedwiththeconsentoftheauthorofthepreviouswork.Thethresholdtoadaptationortransformationisreachedif theanaverageobserver’simpressionofaworkischangednoticeably.Concerningpictures,thisisforexamplethecase if they are cropped or their sizes changed extremely. For films, e.g. if they aremusically rendered. Texts are changed noticeably if they are shortened, amended,mixedwith other texts or translated. A new layout or a transmission of a text fromanalogue to digital form is not an adaptation or transformation -- although usually areproduction -- meaning generally when a text is removed from its originalmedium/contextandremainsrecognizableasadiscretework.(Inexceptionalcasesthechange of the context of theworkmay result in an adaptation. For text corpora forresearchpurposes,however,thisishardtoimagine.)When the original work is no longer recognizable by an average observer, noadaptationexists,butratheranew,independentwork.Here,courtshavesaidthatthepersonalcharacteristicsof thepre-existingwork“fadeaway” fromthenewcontent.15Thedifferencebetweenanadaptation(§23UrhG)andanindependentworkcreatedinfreeuse(§24UrhG)is,however,fluid.16If,onthesurface,thenewcontenthasnothing

15FederalSupremeCourtofGermanyin“Mecki-IgelI”,GRUR1958,500,502.16SeeDreier/Schulze,UrheberrechtsgesetzKommentar,4.ed.,§24MarginalNo.1and§23MarginalNo.4.[2016note:anewer2ndeditionwaspublishedin2015]

Page 18: GUIDELINES FOR BUILDING LANGUAGE ORPORA UNDER … › download › pdf › foerderung › ... · The data that data protection / privacy laws apply to, so-called “personal data”,

18

in common with the previous material, free use of the previous material isunproblematic (as far as the lawof adaptations).Often, this results from themethodwhich is used within Text and Data Mining. If a text, for example, is statisticallyanalyzedorannotated,itcanusuallynotbereconstructedfromtheemergingstatisticsorannotation.Thusbothresearchresultsarenotadaptationsofthesourcetextwithinthemeaningofthelaw.Forsourcetextsthatarestillprotected,thisdoesnotsolvetheproblemofcontractualterms that prohibit temporary copies / caching that is technically necessary for thedevelopmentofresearchresultsandmakingthetextspermanentlyavailableonlywithconsent, (see above). Apart from that, TDM may also be contractually prohibitedbecausecivillawlargelyallowscontractingpartiestoagreeonwhattheywish(thelawof“privateautonomy”).IfaneditorforexampleforbidsTDMorthepublicationofTDMresults, based on a text within a license agreement with a scientific institution thatregulatestheaccesstothematerial,thismustberespected,eveniftheresearchresultsareindependentandnotadaptationsortransformationsandTDMshouldapriorinotbe regarded as a copyright protected type of use. 17 In this case, the basis forenforcement of the prohibition is not the copyright law, but the contractwhichwasentered between the two parties. Such a contract, however, affects only the relevantparties.It is possible to incorporate certain conditions for the use of the material in theagreement instead of a strict prohibition. This can be executed even by standardlicenses,whicharecontracts. It is thereforeconceivable thatresearchorTDMresultsare made subject to copyleft terms.18Disregarding software licenses, however, it isabsolutely not common that the conditions of standard licenses impose conditionsindependently of any existing legal position based on an absolute right (such ascopyright or databaseprotection). The six Creative-Commons licenses even explicitlystate that theydonot restrictanything that the licensee isallowed todowithout thelicense anyway.19Their copyleft terms thus only apply under the pre-condition thatthereisalegalprotectioninthefirstplacethatrequirespermissionofarightsholder.20Thus copyleft and other limitations of CC licences would only be effective if TDM isregardedasatypeofusewithinthemeaningofthecopyrightlaw.Sincethisquestionisnotyetresolvedeverywhereintheworld,thenewCClicenseversion4.0clarifiesexplicitlythattheresultsofTDMshouldnotbeconsideredasan

17WhetherTDMcanberegardedasatypeofuseiscurrentlybeingdiscussedbyjuristsandwillcertainlykeepthecourtsbusy.TherearemanyreasonswhyTDMshouldberegardedasakindofreading,whichisasso-calledWerkgenusspermittedwithoutconsent.Seeabove2.1.attheend.18Meaning,thatthelicenseemustofferthelicensedcontenttothepublicunderidenticalorsimilarcondtions.19Seee.g.section8.d.inthelicenseCC-BYVersion4.0.20ThedatabanklicensesofOpenDataCommonsareanexceptionalcase,becausethesepostulatetheircopyleft-conditionsevenforthoseregionsoftheworldwherenodatabaseprotectionlawexists,e.g.theUnitedStates.

Page 19: GUIDELINES FOR BUILDING LANGUAGE ORPORA UNDER … › download › pdf › foerderung › ... · The data that data protection / privacy laws apply to, so-called “personal data”,

19

adaptationbythelicensor.ThusneitherthecopyleftconditionsofCClicenses21northeotherconditions"attribution,""nocommercialuse"and"noeditsallowed"needtobetakenintoaccount,asfarasTDManditsindependentresultsareconcerned.Ifresearchresultsarestillsomehowconsideredadaptationsortransformationswithinthemeaningofthelaw,i.e.outsideofTDMandwithoutotherlicensesinfluencingthecharacterofadaptation,thesamerecommendationsapplyforfurtheruseoftheseresearchresultsasfortheuseofindependentworks.2.1.4.CollectionsanddatabaseworksAccording to § 4 UrhG, collections of works and databases are protected where theselection or arrangement of the elements constitute the author's own intellectualcreation,regardlessofwhethertheindividualelementsareprotectedornot.Thismayberelevantifcollectionsoftextsinthepublicdomainareincludedinacorpus.Thisprotectionof“databasesworks”shouldnotbeconfusedwithmeredatabases,whosecreatorsareadditionallyprotectedby§§87a-87eUrhG(seeabove).Therelatedrightofthemakerofadatabaseonlyrequiressubstantialinvestment;incontrast,a“databasework”requiressuchanextraordinaryarrangementofthecontentthatthearrangementitselfcanberegardedasacreation(similartoauthorship).Thus,thethresholdforthe(high)levelofprotectionofa“databasework”ismuchhigherthanthoseforadatabaseprotectedinaccordanceto§§87aetseq.UrhG.Thelatterrightofthemakerofadatabaseplacemaycreaterestrictionsofuseifpartsofadatabaseareincludedinacorpusorsuchacorpusismadeavailable.222.1.5.OrphanworksAfter§61UrhGwasinsertedintotheCopyrightActin2014,therearenowsometypesofusespermittedbylawconcerningtextworksfromcollectionsofpubliclyaccessiblelibraries,educationalinstitutions,museumsandarchives,iftheyarealreadypublishedandtherespectiverightholderscannotbefoundoridentifiedevenbyadiligentsearch(definedin§61aUrhG),andthisresearchresultwasrecordedinacentralregister.Thepermittedtypesofuseconcernmakingavailabletothepublic(§19aUrhG)andreproduction(§16IUrhG).Sincetherighttocreatederivativeworksisnotincluded,itmaynotbepossibletorelyon§61UrhGwhenusingsuchworksincorpora.23Totake

21ThenameforthecopyleftmechanismofCClicensesis"sharealike",abbreviatedas"SA".22SeethecourtdecisionoftheEuropeanCourtofJustice(October9,2008,CaseC304/07)andoftheFederalSupremeCourt(August13,2009,filereferenceIZR130/04)23Atthecopydeadlineofthepresentdocument,thiswasstillanopenquestion.AnynewsonthispointwillbepublishedontheCLARIN-DLegalInformationPlatform.[2016note:ingeneral,itseemsthatadaptedversionsarenotcoveredby§61UrhG].

Page 20: GUIDELINES FOR BUILDING LANGUAGE ORPORA UNDER … › download › pdf › foerderung › ... · The data that data protection / privacy laws apply to, so-called “personal data”,

20

thepathofleastlegalrisk,orphanworksshouldonlybeincludedincorporainawaywherebynoadaptationortransformationiscarriedout(seeabove).Thereisstilltheunavoidableproblemthatthestatusofanorphanworkmaysubsequentlyexpireiftherightholdersappearand/orbecomeknown.Fromthispointintime,theusualrulesfortheuseofworksapplyagain.2.1.6.SoftwareThetermsofuseofcommercialsoftwareareusuallyclearlylaidout,inordertodecidethetermsunderwhichitmaybeusedandwhatimplicationsmayarisewhensuchsoftwareisusedtocreateindependentandderivativeworks.Dependingontheapproach,theoutputofthesoftware,i.e.theresearchresultordocument,remainsindependentinitslegalstatusfromthatofthesoftware.Sometimesthelegalstatusismorevaguewithinsoftwaretoolsthatweredevelopedinanacademiccontext,astheyareoftenbasedondata(dictionariesortrainingcorpora)whichmightbeaffectedbythirdpartyrights.Forsoftwaredevelopedin-house,itneedstobenotedthatthedecisionifandunderwhichlicensethesoftwarewillbereleasedisreservedfortheemployerforwhomthesoftwarewascreated(§69bUrhG).2.2.DataprotectionissuesIncompilingandmakingavailablewrittencorpora,dataprotectionrightsmayalsobeaffected.Thisisparticularlytruewithintheuseoftextswhichwerenotprimarilyintendedforfurtheruseordisclosure,suchaschattranscripts.Inthecaseofsuchtexts,itmaybereasonableforethicalreasonsandinordertoavoidsubsequentlegaldisputestodoapseudoanonymization/anonymization(seepart1:1.1.2,above).Besides,thesamelegalrequirements(adequateinformationandconsentofthoseaffected)apply,astheyareexplainedinthefirstpartconcerningtheoralcorpora.Generallyitshouldbeconsideredthatafterareleaseofacorpus,peoplewhoareaffectedbecausetextsaboutthemweremadeavailablemayasktoberemovedfromthecorpus.Insuchacaseaweighingofinterestsofthepeopleaffectedintheindividualcaseneedstobedone.Courtsaregraduallyconceding24constitutionalfundamental

24ThecasebroughtupbyanoffenderreleasedfromprisonagainsttheUniversityofLeipzigondeletionofhisnamefromacorpuswhichwasmadeavailableontheInternet(vocabularyproject)wasupheldatfirstinstancebeforetheDistrictCourtHamburg(filereference324O243/07).InthesecondinstanceitwasrejectedbeforetheHigherRegionalCourtofHamburg(filereference7U123/09;thefulltextsofbothdecisionsarenotavailableontheInternetfreeofcharge)

Page 21: GUIDELINES FOR BUILDING LANGUAGE ORPORA UNDER … › download › pdf › foerderung › ... · The data that data protection / privacy laws apply to, so-called “personal data”,

21

rightstoscientificinstitutionsonthebasisofthecriteriaforlegaladmissibilityofpermanentonlinepressarchives.25However,itmustbeassumedthatincaseofdoubt,thepersonalityrightsofindividualsoutweightheinterestsofaparticularresearcher,unlessitconcernsapersonofhistorical/journalisticinterestwhoappearslegitimatelyinthepress.Whetherthesameistruewithrespecttotheinterestsofanentirebranchofresearchcouldonlybedecidedbyacourt.Regardingthestillexistingproblemofpersistenceofresearchdata,thereisacertainpragmaticconsensuswithinthescientificcommunity:textdeletionsbecauseofpersonalityrightsshouldbeconsideredacceptablealsoepistemologically,sincethereplicabilityofimportantandmethodicallyvalidresearchresultsdoesnotdependonindividualtexts.Whatisprobablymoreimportantisdefactotheorganizationaleffortthatcanbecausedbyindividualdeletions.Itisrecommendedtofactorthisintoprojectcostsinadvance,ifpossible.Forbasicinformationandfurtherdataprotectionissues,seePart1,Section2.2.3.BestPractices2.3.1Recommendationsforbuildingcorpora● Incaseofdoubt,youshouldtrytoobtainlicensesandconsent.

Rightholdersareusuallycooperativewhenitcomestonon-commercial,scientificpurposesandnoeconomicorotherinterestsareviolatede.g.byanunrestricteddistributionofcopies.

● Theattempttogetlicensesshouldbeginasearlyaspossibleintheplanningphaseofaproject,sincethenegotiationsmaydragonoveralongperiodoftimeandthisistheonlywaytoensurethatthenecessaryrightsmaybeobtainedbeforetheprojectstartsandthereforeanylicensefeesorotherrewardsmaybeincludedinthecalculationoftheprojectcosts.

● Alsoasearlyaspossibleintheplanningphase,acentershouldbeapproachedthatisexperiencedwithlicensingoftherelevanttypeofresource.Itmayprovideassistanceorinsomecircumstancestakecareaboutobtainingthelicenses,andatthesametimeensurethatthelicensingtermsaredraftedsothatthedataandtheresultsoftheprojectsmaybeincludedintotheirownarchives/projectsafterthedurationoftheprojectandmadeavailableforthelongterm.

● RecommendationsforthedraftoflicenseagreementscanbefoundinPerkuhnet.

al.(2012,p.53)andontheCLARIN-DLegalInformationPlatform.26

25FederalSupremeCourtofGermany,courtdecisionfromDecember15,2009,p.757et.seq.withfurtherreferences;FederalConstitutionalCourt,CourtdecisionfromJune5,1973,BVerfGE35,p.202et.seq.26http://clarin-d.de/legalissues

Page 22: GUIDELINES FOR BUILDING LANGUAGE ORPORA UNDER … › download › pdf › foerderung › ... · The data that data protection / privacy laws apply to, so-called “personal data”,

22

● Licenseagreementstypicallyhavealimitedterm,especiallyiftheyareassociatedwithfees.Particularlyinthesecases,itisrecommendedtodevelopastrategyincooperationwithacenterformakingthecontentsustainablyavailable.Italsoshouldbenotedthatunintentionalinterpretationsofthelicensorcanpreventlicenserenewalsandadditionallicensesregardlessoftheirlegality.

● Incaseswhereitisnotpossibletoobtainsufficientrightstomakeavailableatext

corpustothescientificcommunitypermanently,butthereasonstobuildthecorpuswereneverthelessstrongenough,thereasonsshouldbedocumentedandcompromisestrategiesshouldbefoundonhowasustainableavailabilitymaybeachievedatleastrudimentarily.Onepossiblemodelise.g.tocomprehensiblydocumenthowtheymayobtainthenecessaryrightsthemselvesforsubsequentusers.

● Dataprotectionissuesshouldalreadybeincludedintheplanningphaseofa

project.Ifitisintendedtocollectpersonaldatatoagreaterextent,anexplicitdocumentonthesubjectshouldbecreatedandmaintained(dataprotectionconcept).Itmustbecapturedwhichdataiscollectedforwhichpurposes.Ifnecessary,appropriateconsentdeclarationformsneedtobedevelopedandtobesignedbythepeopleaffectedbythedataprocessing.

2.3.2.Recommendationsformakingwrittencorporaavailable● Itisusuallynecessaryandcommonpracticetolimitthenumberofusersof

corporatopeoplewhoidentifiedandagreedwithanEndUserLicenseAgreement(seebelow)and,ifnecessary,additionaldataprotectionregulations.Defactothiscanbeachievedbye.g.dataaccessregulationsviapasswordswhichisallocatedonlyonapplicationandonlyinpersonorviaaDFN-AAI-Authenticationandwebformstorequestconsent.

● Asageneralrule,rightsandobligationswhichresultfromlicensingagreements

betweenrightholdersandcorpusprovider,needtobepassedontoend-usersviaenduserlicenseagreementsanddataprivacypolicies(forexampleifacorpusproviderundertakesanobligationtothelicensortodocumentaccesstothecorpus).

● Withregardtopersonaldata,anonymizationandpseudoanonymizationshouldbeconsideredwhenmakingcorporaavailable.

Page 23: GUIDELINES FOR BUILDING LANGUAGE ORPORA UNDER … › download › pdf › foerderung › ... · The data that data protection / privacy laws apply to, so-called “personal data”,

23

2.3.3.Recommendationsforcreatingandmakingownworksavailable:derivativeworksanddatabases● Worksthatarecreatedbyscientiststhemselvesshouldalwaysbereleased

underlicenseterms,inorderthatsubsequentusersinthefuturemayknowiftheycanusetheworkfortheirownpurposes.Atthesametime,contentsthatare(orbecome)freeofcopyrightandonwhichthescientistdidnotacquireanyotherrightsshouldnotbeportrayedasprotectedbylaw,andasfaraspossibleexplicitlymarkedasunprotected,e.g.withthehelpof"PublicDomainMark"(PDM).

● Whenselectinglicenseterms,existing,widely-usedstandardlicensesthatareasliberalaspossibleshouldbeused(e.g.oneofthetwoCreativeCommonslicensesrecognizedintermsoftheOpenDefinition27,namelyCreativeCommonslicenseversionsBYandBYSA,orforsoftware,aGNUlicenseorBSDorApachelicenseswhichrefrainfromcopyleft).SotheresultismostlikelyliketheOpenAccessapproach.TheincreasingtrendistopublishscientificworkswithnotmorelimitationsthantheCreativeCommonslicensetype"CCBY-Attribution,"whilepuredatashouldbelicensedentirelyfreeofrestrictionsby"CC0".Evenscientificpublishersareincreasinglyopentosuchlicenses.

● Particularattentionshouldbepaidtoindicatingthelicenseasaccuratelyas

possibleandeasytofind.

● Problemswithderivativeworksmaybeavoidedinsomecases,forexamplewhenannotationsarepublishedasanindependentworkfromwhichtheoriginalworkcannotbereconstructed.Ifthelicensewhichisadvisedforaderivativeworkisroughlyequivalenttotheunderlying,thesamelicenseshouldbeusedtofacilitatethereusability.Inanycase,provisionsofthelicenseoftheunderlyingworkthatsometimesallowonlycertainlicensesforlaterprocessing(seee.g.the“Share-Alike”clausesinCreativeCommonslicenses28)shouldbenoted.

2.3.4.Recommendationsfortheuseofsoftwarewhencreatingderivativeworks● Ifnolicensetermsareknown,oneshouldattempttodetermineifandwhich

restrictionsapplytotheuseofthesoftware.

27http://opendefinition.org/od/28SeethevarietyofcontentwhichiscombinedunderdifferentCreativeCommonslicenses,https://wiki.creativecommons.org/FAQ#Can_I_combine_material_under_different_Crea-tive_Commons_licenses_in_my_work.3F

Page 24: GUIDELINES FOR BUILDING LANGUAGE ORPORA UNDER … › download › pdf › foerderung › ... · The data that data protection / privacy laws apply to, so-called “personal data”,

24

● Particularlywithcommercialannotationtools,itmaybereasonabletoclarifyand

setoutinasupplementaryagreementtheextentthattheoutputsofthesoftwaremaybedistributed,becausesoftwarelicenseprovisionsoftenprohibitthisaltogether.Generally,however,onlyreverseengineeringistobeprevented.

● Beforeusingorlicensingsoftware,itshouldbeclarifiedtowhatextentthe

outputsofthesoftwaremaystillbeusedafterthelicensetermexpires.

Referencesforpart2Dreier,Thomas/Schulze,Gernot(2013):Urheberrechtsgesetz:UrhG.Urheberrechtswahrnehmungsgesetz,Kunsturhebergesetz,Kommentar.4.Aufl.München:C.H.BECK[2016note:anewer2ndeditionwaspublishedin2015]Kamocki,Pawel/Ketzan,Erik(2014):CLARIN-DLegalInformationPlatform,availableathttp://clarin-d.deKamocki, Pawel / Ketzan, Erik (2014): Preparation of corpora from online and otherresources: current state of German and EU law, 7. Arbeitstagung des Empirikom-Netzwerks: "Social Media Corpora for the eHumanities: Standards, Challenges, andPerspectives",20.02.2014.Perkuhn,Rainer/Keibel,Holger/Kupietz,Marc(2012):Korpuslinguistik.-Paderborn:Fink,2012.(UTB18)

Page 25: GUIDELINES FOR BUILDING LANGUAGE ORPORA UNDER … › download › pdf › foerderung › ... · The data that data protection / privacy laws apply to, so-called “personal data”,

25

Participantslist:DFGroundtablemeeting“Spokencorpora”

November9,2012DFGoffice,Kennedyallee40,Bonn(Germany)

ProfessorDr.BerntAhrenholz Friedrich-Schiller-UniversityinJenaDr.JörgBücker WestfälischeWilhelms-UniversityinMünsterProfessorDr.KristinBührig UniversityofHamburgProfessorDr.ArnulfDeppermann InstitutfürdeutscheSprache,MannheimDr.SebastianDrude MaxPlanckInstituteforPsycholinguisticsDr.SigrunEckelmann DFGinBonnDr.OliverEhmer FreiburgProfessorDr.ChristianFandrych UniversityofLeipzigProfessorDr.CarolineFéry Goethe-UniversityinFrankfurtamMainProfessorDr.UlrikeGut WestfälischeWilhelms-UniversityinMünsterProfessorDr.RüdigerHarnisch UniversityofPassauDr.DagmarJung UniversityofCologneProfessorDr.RolandKehrein Philipps-UniversityinMarburgDr.KerstinKucharczik Ruhr-UniversityinBochumDr.ChristophKümmel DFGinBonnSlawomirMessner Philipps-UniversityinMarburgDr.GaiadiLucio Bonn,PT-DLRProfessorDr.BerndMeyer JohannesGutenberg-UniversityofMainzLudgerPaschen Ruhr-UniversityofBochumProfessorDr.StefanPfänder Albert-Ludwigs-UniversityofFreiburgDr.ChristophPurschke UniversitéduLuxembourgProfessorDr.UtaM.Quasthoff TechnischeUniversityofDortmundProfessorDr.AngelikaRedder UniversityofHamburgDr.InesRehbein UniversityofPotsdamProfessorDr.ChristianSappok Ruhr-UniversityofBochumPDDr.FlorianSchiel Ludwig-Maximilians-UniversityofMunichDr.ThomasSchmidt InstitutfürdeutscheSprache,MannheimProfessorDr.StavrosSkopeteas UniversityofBielefeldAdrianaSlavcheva UniversityofLeipzigJanStrunk KölnDr.VeraSzöllösi-Brenig Volkswagen-StiftungProfessorDr.DorisTophinke UniversityofPaderbornDr.HelgaWeyerts-Schweda DFGinBonnProfessorDr.HeikeWiese UniversityofPotsdamDr.KaiWörner UniversityofHamburg

Page 26: GUIDELINES FOR BUILDING LANGUAGE ORPORA UNDER … › download › pdf › foerderung › ... · The data that data protection / privacy laws apply to, so-called “personal data”,

26

Participantslist:DFGroundtablemeeting“Textcorpora”

November15,2013DFGoffice,Kennedyallee40,Bonn(Germany)

Dr.NoahBubenhofer TechnischeUniversityofDresdenProfessorDr.ArnulfDeppermann InstitutfürdeutscheSprache,Mannheim(IDS)ProfessorDr.DagmarDeuber WestfälischeWilhelms-UniversityofMünsterDr.Eva-MariaDickhaut AkademiederWissenschaftenundderLiteraturMainz

ProfessorDr.MechthildHabermann Friedrich-Alexander-UniversityofErlangen-NürnbergDr.AlexanderGeyken

Berlin-BrandenburgischeAkademiederWissenschaften

ProfessorDr.ThomasGloning Justus-Liebig-UniversityofGießenProfessorDr.IrynaGurevych TechnischeUniversityofDarmstadtProfessorDr.UlrichHeid StiftungUniversityofHildesheimProfessorDr.GerhardHeyer UniversityofLeipzigProfessorDr.ErhardW.Hinrichs Eberhard-Karls-UniversityofTuebingenProfessorDr.MartinHuber UniversityofBayreuthProfessorDr.MagnusHuber Justus-Liebig-UniversityofGießenProfessorDr.WolfPeterKlein Julius-Maximilians-UniversityofWürzburgDr.MarcKupietz InstitutfürdeutscheSprache,Mannheim(IDS)ProfessorDr.GerhardLauer Georg-August-UniversityofGöttingenProfessorDr.ChristianMair Albert-Ludwigs-UniversityofFreiburgProfessorDr.AlexanderMehler Goethe-UniversityofFrankfurtamMainProfessorDr.RolandMeyer Humboldt-UniversityofBerlinProfessorDr.ManfredPinkal UniversitätdesSaarlandesDr.RolandSchäfer FreieUniversityofBerlinProfessorDr.IngridSchröder UniversityofHamburgDr.SilkeSchwandt Goethe-UniversityofFrankfurtamMainProfessorDr.ManfredStede UniversityofPotsdamProfessorDr.AngelikaStorrer UniversityofMannheimProfessorDr.ElkeTeich UniversitätdesSaarlandesDr.HelgaWeyerts-Schweda DFGinBonnDr.StefanWinkler-Nees DFGinBonnProfessorDr.HeikeZinsmeister UniversityofHamburg