45
Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting Nov. 14, 2012

Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

Embed Size (px)

Citation preview

Page 1: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

Batch-Load Points Counter(MARCEdit project)

Amelia C. VanGundyThe University of Virginia’s College at Wise

Virginia SirsiDynix Library Users Group MeetingNov. 14, 2012

Page 2: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

2

John Cook Wyllie Library http://library.uvawise.edu/

• Ebook titles in OPAC & Ebook packages on web in finding aids

• Rate of e-book acquisition increased netLibrary – 3k titles per year

EBSCOhost Ebook Academic Collection – 65k titles initial load– 5-10k titles additional every quarter

Page 3: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

3

Batch Loading Problems

• Existing procedures were difficult to follow• Procedures were inconsistent– especially for different vendors

• Didn't take advantage of MARCEdit Tools• 949 holdings field now includes $a class#– previously, files loaded with AUTO “call#”

Page 4: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

4

Solution? Wish list?

Determine quality of MARC records– OCLC files vs. other vendor files

Determine editing priorities– required (001/949), recommended, optional

Learn to construct Regular Expression Strings– Batch Editing Tools & Find/Replace

• Streamlined format– needed both an outline & more detailed info

• Make available on-line/web-page

Page 5: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

5

MARCEdit proficiency

• Beginner

Advanced Beginner– Uses MARCEditor Tools window

(Add/Delete field, Edit Subfield Data, Sort by... )

– Can apply Regular Expression Strings

Intermediate– Uses MARC Tools wizard

(Extract Selected Records, MARCSplit, Extract selected records)

– Can construct Regular Expressions

• Expert

Page 6: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

6

Batch-Load Points Counter (BLPC) people.uvawise.edu/acv6d/

Page 7: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

7

Batch-Load Points Counter (BLPC) Webpage & Project link

people.uvawise.edu/acv6d/

1. Introduction– project concept & desired outcomes

2. Checklist #– outlines the batch-load procedures & steps– points counter: “what to do” & “when to stop”

3. Processing Guidelines #– procedures & how-tos & copy/paste info

4. 949 processing

Page 8: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

8

BLPC Introduction & Outcomes

• Validation– determine integrity of the file

• Processing – determine quality of the records

• Statistics– track vendor pkgs, record counts, 001 prefixes

• Points– max. points = 150 (2.5 hours)• STOP & contact vendor (request corrected file)

Page 9: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

9

BLPC CheckList w/Time estimates

• Step 1 & 2: Preparation & validation– number of records in file– integrity of file– valid URL links

• Step 3-4: Review & processing– quality of records– lists all processing/edits possible

• Step 5: 949 holdings

Print on one page (2 p. per sheet / front&back)

Page 10: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

10

BLPC Processing Guidelines(Procedures)

• Gives details for CheckList– Steps 1-2, Steps 3-4, Step 5

• Gives the regular expression strings (copy/paste)– Finding/ Replacing/Deleting– MARCEditor Tools & MARCEdit Tools

• Always use along with Checklist– includes information to process every field, BUT

– not every field needs processing

Do not print out

Page 11: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

11

BLPC Step 1: Preparation & Reports

• MARC Validator– Identify Invalid Records– Validate Record (copy/paste into text file)

• Material Type Report

• Field Count– verify vendor count against MARCEditor count (LDR/000)– count early / count often

• Deduplicate (See Addt’l Instruct.)

Page 12: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

12

Reports/MARC Validator:Identify Invalid Records

Page 13: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

13

Reports/MARC Validator:Validate Records

Page 14: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

14

Reports/Material Type

Page 15: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

15

BLPC Step 2: Verify Field Counts

• Reports/FieldCount for error checking– first field listed is 000 (corresponds to =LDR)

– last field listed is “numeric”– 245 count

• Reports/MARCValidator errors – open text file created in Step 1– look for specific errors in error file

• Check URL links to make sure they work

Page 16: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

16

Reports/Field Count(vendor count = 8556)

Page 17: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

17

Field Count Error & "bad field tag"(vendor count =694)

Page 18: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

18

Reports/Field Count: Detail(highlight field & right-click)

Page 19: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

19

Review Validate Records report(saved as text file in Step 1.B)

Page 20: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

20

BLPC: Review for processingChecklist Step 3 workflow

Check field counts Mark-up notes on the Checklist

– Track/count fields that need processing Track points for fields that need processing Track points for fields that need manual editing

Each record to fix means extra points Rule of thumb: for more than 12 manual edits

Treat as separate post-load maintenance project

Page 21: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

21

BLPC Checklist Step 3: Review FieldsExamples of required processing

Examine first record & check field count Title control# – 001 (prefer OCLC#)

If lacking: use info. from 035 or create local 001 Check field counts / subfield counts

Title/GMD – 245 $h URL – 856 $3 $y $u

Check Validate Record text file for errors “Invalid field format” / “Subfield cannot repeat”

Check field counts / indicator counts Subject – 650 Ind2 = 4/7 or 5/6/8

Page 22: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

22

BLPC Checklist Step 4: Review fieldsExamples of optional processing

Check field count & delete if present 029 / 583 / 584 / 938

Check field data and delete Other vendor pkg names

(netLibrary/ebrary/myiLibrary/24x7/Ebsco) Check field data & ignore/defer

300 lacks phrase: (1 electronic resource)

Page 23: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

23

BLPC Checklist with mark-ups

Page 24: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

24

BLPC Processing workflowStep 3 - Step 4

Review Field Count Review Field data

– Use Find/Sort window and review first/last field Add/Delete/Edit field Review Field data

– look at field in first record or Find/Sort window– Mistake? Typo? – use the Edit/SpecialUndo

Review FieldCount Save edited file / SaveAs new filename

Page 25: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

25

MARCEditor Tools window adding/editing/deleting fields adding/editing deleting subfields

MARCEditor Edit/Find window editing/replacing field data displays sortable list

MARCEdit Tools wizard for select & extract records extract tab-delimited records for Excel

MARCEditor / MARCEdit Tools BLPC Checklist identifies fields to process

Page 26: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

26

BLPC Processing: Add std. Phrase506 => Step 3.S

• Check Field Count for presence of 506• Delete existing 506 field (if present)• Consult Step 3.S in BLPC Procedures– Determine that AddField Tool is needed for processing– Copy Std.phrase from Step 3.S notes– Paste into AddField Tool window and submit

• Review 506 data in first record• Check field count• Save file

Page 27: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

27

MARCEditor Tools: Add std. Phrase506 => Step 3.S

Page 28: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

28

BLPC Processing: Delete specific fields 650 Ind2= 5/6/8 (non-LC) => Step 3.V

• Check Field Count for Presence of 650 Ind2=5/6/8• Consult Step 3.V in BLPC Procedures– Optional Review – FindAll(RegEx) instructions– Determine that Tools/DeleteField tool is needed– Copy RegEx pattern from Step 3.V– Paste into Tools/DeleteField window

– Use Regular Expressions radio button option– Submit using Delete button

• Check Field Count & Indicator count• Save file

Page 29: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

29

MARCEditor: Delete specific fields 650 Ind2= 5/6/8 (non-LC) => Step 3.V

Page 30: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

30

Regular expressions (RegEx)

• Finding/Editing patterns in strings (letters/numbers)

– Like learning another language• Parentheses are used to group data– Forces the computer to "store" data in "chunks"– Data “chunks” are numbered for recall/retrieval/use– Helps the programmer "read" the pattern

• Optional functionality, and not necessary

• Some punctuation is "reserved" (has a special meaning)

• BLPC uses consistent format for RegEx patterns

Page 31: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

31

Reading RegEx Patterns650 Ind2= 5/6/8 (non-LC)

Pattern: (=650 )(.[568])(\$a)(.+)

(=650 ) look for 650 fields with two blank spaces

(. [568]) look for any Ind1 & listed Ind2 numbers

(\$a) look for subfield $a (used as "anchor chunk")

(.+) any letter/number to the end of the field

Use Edit/FindAll(RegEx) to verify pattern

Page 32: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

32

Interpreting RegEx punctuation

Pattern: (=650 )(.[568])(\$a)(.+)

( ) Parentheses for data “chunks” . Period for any single letter/number[ ] Square brackets for a list using “OR”

\ Backslash before “reserved” punctuation

esp.: $ \ ( ) [ ]

+ Plus sign for more of the same

“Chunks” are stored as: $1$2$3$4

Page 33: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

33

Creating RegEx patterns

• Start with known pattern:For non-LC Subjects: (=650 )(.[568])(\$a)(.+)

FindAll(RegEx) for “local” Subjects (Ind2 = 4/7)(=650 )(.[47])(\$a)(.+)

FindAll(RegEx) for “local” Genres (Ind2 = 4/7)(=655 )(.[47])(\$a)(.+)

Page 34: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

34

Editing with RegEx string pattern 650 BISAC subjects => 690

Start with known pattern: (=650 )(.[568])(\$a)(.+)

• Use Edit/Replace(RegEx): Change 650 to 690 Identify “BISAC” subjects: Ind2=7 & $2 = bisacsh

• Determine which “chunks” change/stay the same

Find(RegEx): (=650 )(.[7])(\$a)(.+)(\$2bisacsh)

Replace(RegEx): (=690 )$2$3$4$5

Page 35: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

35

Reading RegEx Patterns650 BISAC subjects => 690

Pattern: (=650 )(.[7])(\$a)(.+)(\$2bisacsh)

(=650 ) look for 650 fields with two blank spaces

(.[7]) look for any Ind1 & Ind2 =7 (\$a) look for subfield $a (optional “anchor” text)(.+) any letter/number to the next “chunk”(\$2bisacsh) look for subfield & data at end of field

Can be shortened (which makes the pattern look complicated): Find(RegEx): (=650)(.+\$2bisacsh)Replace(RegEx): (=690)$2

Page 36: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

36

MARCEditor: FindAll(RegEx) Testing the pattern: 650 BISAC subjects

Page 37: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

37

MARCEditor: Replace(RegEx) 650 BISAC subjects => 690

Page 38: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

38

BLPC Step 5: 949 processing Required processing

Policy: Include Class# in Unicorn Item record949

$a -- Pull the call# from the 050$a -- Insert the standard phrase: ' INTERNET'$v -- Pull the 001/OCLC# as a unique no.$w $h $t $x $z -- Add standard holdings data

• See Addt'l instruct,

Page 39: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

39

Batch-loading• MARCEdit with files no larger than 10k records– MARCEdit/Tool MARCSplit

• MARCEditor/File: Compile File into MARC• Unicorn batch load rpt uses 001 match point– 'o' for OCLC# o & 'g' for local vendor key

• Unicorn batch load rpt settings– create new bibliographic records only

• Date cataloged -- back dated to prev. month– prevents interference w/scheduled Authority reports– max. load two files a day

Page 40: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

40

Identifying records for Cleanup

Checklist finds problems to correct post-load

• Item maintenance projects– 949 lacks call#

• Bibliographic record maintenance projects– 245 lacks $h (if more than 5-12 records) – URLs lacking

• Record reload/overlay project– Record already in OPAC (P-N duplicates)

Page 41: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

41

MARCEdit Tools: Select/Extract selected records

Step 3.F: 245 lacks $h

Page 42: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

42

MARCEdit Tools: Export Tab Delimited records

Page 43: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

43

Help!• MarcEdit Help

http://people.oregonstate.edu/~reeset/marcedit/html/help.html– Click thru the Contents menu:

Contents / Using MARCEdit / Using the MARCEditor / Editing Functions / Using Regular Expressions.

• RegularExpressions.info http://www.regular-expressions.info/

MARCEDIT-L listhttp://metis3.gmu.edu/cgi-bin/wa?A0=MARCEDIT-L

BATCH listhttp://listserv.vt.edu/cgi-bin/wa?A0=batch

Page 44: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

44

Amelia C. VanGundyThe University of Virginia's College at Wise

John Cook Wyllie Library

[email protected]

http://people.uvawise.edu/acv6d/

Virginia SirsiDynix Library Users Group MeetingNov. 14, 2012

Page 45: Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting

45

BLPC ProjectPresentation revisions

Originally presented Nov. 14, 2012

• Additional Slides:– BLCP Project web-page– MARCEditor: FindAll(RegEx)– MARCEdit Tools: Export Tab Delimited records– BLPC Project: Presentation revisions