19
Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting Nov. 14, 2012 '1) Batch-Load Points Counter -- (MARCEdit project) Title screen & Presentation notes Amelia C. VanGundy -- The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting -- Nov. 14, 2012 Welcome & Introduction About: Presenter = Catalog Librarian & ~10 years experience w/MARCEdit UVa-Wise.Library = medium-sized academic library Unicorn GL3 Handouts: BLPC slide show BLPC Checklist Virginia COSUGI -- Sept 19,2012 handout MARCEdit Workshop / Thena Jones Audience questions: Public/Academic/Other (School/Special) Catalogers&TS/Systems/Reference/Other=Lucky? John Cook Wyllie Library http://library.uvawise.edu/ Ebook titles in OPAC & Ebook packages on web in finding aids Rate of e-book acquisition increased netLibrary 3k titles per year EBSCOhost Ebook Academic Collection 65k titles initial load 5-10k titles additional every quarter 2 '2) John Cook Wyllie Library -- http://library.uvawise.edu/ Library philosophy :: commitment to making ebooks accessible Like other libraries, as ebooks become less expensive :: the library purchases more ebooks Fortunate :: The smaller files from VIVA & netLibrary ebooks packages were good practice (and training/experience) for the larger Ebsco files Batch Loading Problems Existing procedures were difficult to follow Procedures were inconsistent especially for different vendors Didn't take advantage of MARCEdit Tools 949 holdings field now includes $a class# previously, files loaded with AUTO “call#” 3 '3) Batch Loading Problems Unfortunate :: As the pace picked up with Ebsco ebooks the cracks began to show Reviewing all the ad hoc policies & procedures :: everything was a big mess & a big time sink Additionally-- Policy/processing change to the 949 field needed to be documented

Amelia C. VanGundy The University of Virginia’s …people.uvawise.edu/.../PresentationBLPCProjNotes... · Amelia C. VanGundy The University of Virginia’s College at Wise ... Start

Embed Size (px)

Citation preview

Page 1: Amelia C. VanGundy The University of Virginia’s …people.uvawise.edu/.../PresentationBLPCProjNotes... · Amelia C. VanGundy The University of Virginia’s College at Wise ... Start

Batch-Load Points Counter(MARCEdit project)

Amelia C. VanGundyThe University of Virginia’s College at Wise

Virginia SirsiDynix Library Users Group Meeting

Nov. 14, 2012

'1) Batch-Load Points Counter -- (MARCEdit project) Title screen & Presentation notes Amelia C. VanGundy -- The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting -- Nov. 14, 2012 Welcome & Introduction About: Presenter = Catalog Librarian & ~10 years experience w/MARCEdit UVa-Wise.Library = medium-sized academic library Unicorn GL3 Handouts: BLPC slide show BLPC Checklist Virginia COSUGI -- Sept 19,2012 handout MARCEdit Workshop / Thena Jones Audience questions: Public/Academic/Other (School/Special) Catalogers&TS/Systems/Reference/Other=Lucky?

John Cook Wyllie Libraryhttp://library.uvawise.edu/

• Ebook titles in OPAC & Ebook packages on web in finding aids

• Rate of e-book acquisition increased

netLibrary– 3k titles per year

EBSCOhost Ebook Academic Collection – 65k titles initial load

– 5-10k titles additional every quarter

2

'2) John Cook Wyllie Library -- http://library.uvawise.edu/ Library philosophy :: commitment to making ebooks accessible Like other libraries, as ebooks become less expensive :: the library purchases more ebooks Fortunate :: The smaller files from VIVA & netLibrary ebooks packages were good practice (and training/experience) for the larger Ebsco files

Batch Loading Problems

• Existing procedures were difficult to follow

• Procedures were inconsistent

– especially for different vendors

• Didn't take advantage of MARCEdit Tools

• 949 holdings field now includes $a class#

– previously, files loaded with AUTO “call#”

3

'3) Batch Loading Problems Unfortunate :: As the pace picked up with Ebsco ebooks – the cracks began to show Reviewing all the ad hoc policies & procedures :: everything was a big mess & a big time sink Additionally-- Policy/processing change to the 949 field needed to be documented

Page 2: Amelia C. VanGundy The University of Virginia’s …people.uvawise.edu/.../PresentationBLPCProjNotes... · Amelia C. VanGundy The University of Virginia’s College at Wise ... Start

Solution? Wish list?

Determine quality of MARC records

– OCLC files vs. other vendor files

Determine editing priorities

– required (001/949), recommended, optional

Learn to construct Regular Expression Strings

– Batch Editing Tools & Find/Replace

• Streamlined format

– needed both an outline & more detailed info

• Make available on-line/web-page4

'4) Solution? Wish list? Desired outcomes -- Policies applied all the same way :: Consistent Procedures have all the same workflow :: Consolidated Documentation all in one place :: Contained Emphasis for presentation = record quality / editing priorities / advanced MARCEditor & tools

MARCEdit proficiency

• Beginner

Advanced Beginner– Uses MARCEditor Tools window

(Add/Delete field, Edit Subfield Data, Sort by... )

– Can apply Regular Expression Strings

Intermediate– Uses MARC Tools wizard

(Extract Selected Records, MARCSplit, Extract selected records)

– Can construct Regular Expressions

• Expert

5

'5) MARCEdit proficiency -- So Talking about MARCEdit Proficiency & Skills-- AdvancedBeginner / Intermediate levels MARCEditor Tools windows (using with an open file) MARCEdit Tools wizard (using the directory path & file name) Presenter level = Intermediate & using MARCEdit version: 5.7 (but want to process batch files like an AdvancedBeginner – otherwise it is real work Audience questions: opened a file in MARCEditor successfully used MARCEditor successfully used RegEx successfully (quick, see who they are so you can ask them questions)

Batch-Load Points Counter (BLPC) people.uvawise.edu/acv6d/

6

' 6 ) BLCP Project webpage (New Slide) Identified a Problem! Outlined desired outcomes! What comes next?? Hint – think like an academic librarian : A PROJECT!! With Documentation! With a Presentation! --- Presenter's (temporary) professional page The presentation slide show and notes will be posted here BLPC webpage incl. four PDF files Also Checklist & Procedures as an MSWord document is concealed under the 'hash' mark)

Page 3: Amelia C. VanGundy The University of Virginia’s …people.uvawise.edu/.../PresentationBLPCProjNotes... · Amelia C. VanGundy The University of Virginia’s College at Wise ... Start

Batch-Load Points Counter (BLPC) Webpage & Project link

people.uvawise.edu/acv6d/

1. Introduction– project concept & desired outcomes

2. Checklist #– outlines the batch-load procedures & steps

– points counter: “what to do” & “when to stop”

3. Processing Guidelines #– procedures & how-tos & copy/paste info

4. 949 processing7

'7) Batch-Load Points Counter (BLPC) -- Webpage & Project link people.uvawise.edu/acv6d/ Presentation focus = Checklist & procedures for using MARCEdit emphasis on – Error reports & recognizing when a file is too horrible to continue processing examples of – Error reports & Regular expressions (string patterns)

BLPC Introduction & Outcomes

• Validation

– determine integrity of the file

• Processing

– determine quality of the records

• Statistics

– track vendor pkgs, record counts, 001 prefixes

• Points

– max. points = 150 (2.5 hours)

• STOP & contact vendor (request corrected file)8

'8) BLPC Introduction & Outcomes ::Revision -- change: STOP & contact vendor (request new file) to: STOP & contact vendor (request corrected file Webpage -- Basic project overview (the executive summary) Integrity of file = prevents load program from crashing Quality of records = prevents glitches in display & indexing (& may prevent batch load program from crashing) Developed the Points Counter to prevent Time Sinks! “Time” points add up – max. time limit for processing = 2.5 hours Response to time limits Stop & contact vendor =refuse delivery vs. Continue & notify vendor of problems that need correction (optional)

BLPC CheckList w/Time estimates

• Step 1 & 2: Preparation & validation– number of records in file

– integrity of file

– valid URL links

• Step 3-4: Review & processing– quality of records

– lists all processing/edits possible

• Step 5: 949 holdings

Print on one page (2 p. per sheet / front&back)9

'9) BLPC CheckList w/Time estimates Webpage -- Consolidation/Summary of all processing procedures with statistics at the top Four pages total printed on one sheet of paper Serves as a Memory aid with enough space to: jot down counts/notes check-off tasks cross-off fields that need no further processing Work quickly through the checklist to identify fields that will cause slow-downs Each step has a “time” point, which counts the time it takes for

Page 4: Amelia C. VanGundy The University of Virginia’s …people.uvawise.edu/.../PresentationBLPCProjNotes... · Amelia C. VanGundy The University of Virginia’s College at Wise ... Start

reviewing both the Checklist and the Procedures step Extra points are assigned when manual processing is needed. Count each problem record for the extra points for ex.: 3 records have multiple 245 fields (Step 2.D) 2 +2 each = 2 +6 = 8 points/minutes total to review & process the field Note: the Checklist review finishes before the Processing procedures begin

BLPC Processing Guidelines(Procedures)

• Gives details for CheckList– Steps 1-2, Steps 3-4, Step 5

• Gives the regular expression strings (copy/paste)

– Finding/ Replacing/Deleting

– MARCEditor Tools & MARCEdit Tools

• Always use along with Checklist– includes information to process every field, BUT

– not every field needs processing

Do not print out10

'10) BLPC Processing Guidelines (Procedures) Still reviewing the BLPC project on the webpage Remember – the Checklist keeps track of which fields need processing and which field do not need any further attention The Procedures contain all the nitty gritty details Consult only after you finish the checklist & and are ready to start processing Notes for each step give the MARCEditor Tool required & the RegEx string pattern Designed for copy/paste processing search patterns Designed at the AdvancedBeginner level

BLPC Step 1: Preparation & Reports

• MARC Validator– Identify Invalid Records– Validate Record (copy/paste into text file)

• Material Type Report

• Field Count– verify vendor count against MARCEditor count

(LDR/000)

– count early / count often

• Deduplicate (See Addt’l Instruct.)

11

'11) BLPC Step 1: Preparation & Reports Ready to begin :: Start with the BLPC Checklist -- handout MARCEditor Reports tab Validator reports = Identify Invalid Records = integrity of file = errors that will crash during loading Validate Records = quality of records = errorsthat cause indexing/display problems & also may crash during load Field Counts = Review often to make on-the-fly verification of the integrity of the file if the FieldCount for the 000/LDR changes = question the file integrity

Page 5: Amelia C. VanGundy The University of Virginia’s …people.uvawise.edu/.../PresentationBLPCProjNotes... · Amelia C. VanGundy The University of Virginia’s College at Wise ... Start

Deduplicate: Looks for multiple records for a title in the same file In the Procedures the note: “See Addtl instruct.” often means -- There are no procedures developed yet for this step

Reports/MARC Validator:Identify Invalid Records

12

'12) Reports/MARC Validator : Identify Invalid Records Example shows incorrect formatting of the first 020, the 020 should have a blank " \ " (a 'back-slash') for both Ind1 & Ind2 Checklist Step 1.A = +150 STOP & Contact vendor – the file will crash when loaded Request corrected file or Rule of thumb (or fingers/hand), correct field manually if no more than 5 to 12 records affected

Reports/MARC Validator:Validate Records

13

'13) Reports/MARC Validator: Validate Records Often produces a long report with a lot of minor errors -- that don't affect loading/indexing/display The serious errors are hidden in the list. Solution: HighlightAll (ctrl-a) / Copy (ctrl-c) then Paste (ctrl-v) into text file (MSWord) The text copy of the report will be reviewed later for specific errors Example shows record with multiple 245 fields (a MARC no-no) Checklist Step 2.D = Notify vendor Correct fields manually (if no more than 5-12 records affected) Points :: 2 +2 each -- for 1 error = 2 + 2 = 4 points/minutes to process/correct

Page 6: Amelia C. VanGundy The University of Virginia’s …people.uvawise.edu/.../PresentationBLPCProjNotes... · Amelia C. VanGundy The University of Virginia’s College at Wise ... Start

Reports/Material Type

14

'14) Reports/Material Type Example shows a file of e-book records includes cataloging for videos Checklist Step 1.C = +150 STOP & Contact vendor – the file includes inappropriately selected records Request new file or Rule of thumb (or fingers/hand): if fewer than 5 to 12 records affected -- plan to correct as part of Step 3 if more than 5 to 12 records affected -- indentify & correct later :: post-load maintenance project Find: videos, sound recordings, serials, integrating resources (Step 1.C) Audiovisuals-- FindAll(RegEx): (=LDR )(.{6})([gj])(.+) Serials -- FindAll(RegEx): (=LDR )(.{7})([js])(.+)

BLPC Step 2: Verify Field Counts

• Reports/FieldCount for error checking

– first field listed is 000 (corresponds to =LDR)

– last field listed is “numeric”

– 245 count

• Reports/MARCValidator errors

– open text file created in Step 1

– look for specific errors in error file

• Check URL links to make sure they work15

'15) BLPC Step 2: Verify Field Counts Emphasis for Step 2 is file integrity & record quality Use the Field count report as a double check while you make changes to the file Validator errors -- Again, don't check/correct the minor errors The Checklist includes the most serious errors that need to be corrected -- One 245 field must be present 245: Has been marked as a non-repeating field 245- $h: Subfield cannot repeat Only one 1XX tag is allowed Has been marked as a non-repeating field Incomplete or Dangling Subfield Subfield cannot repeat Invalid field format; invalid characters present Checking the URL is the only part of Step 1 & Step2 that is not part the MARCEditor Reports tab

Page 7: Amelia C. VanGundy The University of Virginia’s …people.uvawise.edu/.../PresentationBLPCProjNotes... · Amelia C. VanGundy The University of Virginia’s College at Wise ... Start

Reports/Field Count(vendor count = 8556)

16

'16) Reports/Field Count Example shows a file from the vendor with 8,556 records MARCEditor Field Counts should be the same Checklist Step 1.D & 2.A / 2.B & 2.D = +150 Verify-- first field count :: LDR/000 field last field counts :: valid MARC tag (this file has: 994 field) Also verify counts for-- 008 field (part of FixedField) 245 $a field/subfield

Field Count Error & "bad field tag"(vendor count =694)

17

'17) Field Count Error & "bad field tag" Example shows a file from the vendor with 694 records MARCEditor Field Counts are not the same Verify-- first field count = LDR/000 field shows fewer records last field counts = not numeric :: file has been corrupted (during processing) Usually occurs unexpectedly after complex editing (often in the LDR or 00x fields) Count early / Count often

Reports/Field Count: Detail(highlight field & right-click)

18

'18) Reports/Field Count: Detail Example shows a file from the vendor with 8,556 records Use the Field Count details to check the counts for Indicators & Subfields Field Count detail shows correct count for the 245 $a field Notice also mismatch in count of 245 $h Checklist Step 3.F includes Addt'l Instructions to identify which title fields lack $h GMD (often used for programing different OPAC icons for books / audiovisuals / e-books

Page 8: Amelia C. VanGundy The University of Virginia’s …people.uvawise.edu/.../PresentationBLPCProjNotes... · Amelia C. VanGundy The University of Virginia’s College at Wise ... Start

Review Validate Records report(saved as text file in Step 1.B)

19

'19) Review Validate Records report Example shows a text file from Step 1.B The most serious errors are specifically listed in the Checklist under the Step 3 review Use the Edit/Find function in MSWord to find the error messages Checklist Step 3.E looks for multiple $h subfields in the 245 field Serious error: 245 $h Review the Field Count details to see approximately how many extra 245 $h subfields may be present compare the 245 $a count against the 245 $h count Copy/Paste & Find the error message in the text file to identify the titles of the problem records Correct field manually, if no more than 5 to 12 records affected Note: the rpt provides the title Note: a record in the previous file was lacking a 245 $h subfield, this file has too many Accept Minor errors & do not process/correct: for ex. 100 Ind1 error

BLPC: Review for processingChecklist Step 3 workflow

Check field counts

Mark-up notes on the Checklist

– Track/count fields that need processing

Track points for fields that need processing

Track points for fields that need manual editing

Each record to fix means extra points

Rule of thumb: for more than 12 manual edits

Treat as separate post-load maintenance project

20

'20) BLPC: Review for processing - Checklist Step 3 workflow Still working with the Checklist Still identifying errors -- mostly with the Field Count report Focus: Still keeping track of points -- but now small point counts that accumulate usually 2 + 2 each :: 6 fields to correct = 2 + 12 = 14 points/minutes The more points -- the more time needed for corrections/processing Use the point count to rate the file :: know where the slow-downs are before you start processing In practice, when I process a file from an unfamiliar vendor I work

Page 9: Amelia C. VanGundy The University of Virginia’s …people.uvawise.edu/.../PresentationBLPCProjNotes... · Amelia C. VanGundy The University of Virginia’s College at Wise ... Start

through the Checklist and find all the problem areas before I even attempt changes to the file But if I know the vendor provides quality records, for Step 3 & Step 4, I often work off both the Checklist and the Processing procedures at the same time

BLPC Checklist Step 3: Review FieldsExamples of required processing

Examine first record & check field count Title control# – 001 (prefer OCLC#)

If lacking: use info. from 035 or create local 001

Check field counts / subfield counts Title/GMD – 245 $h

URL – 856 $3 $y $u

Check Validate Record text file for errors “Invalid field format” / “Subfield cannot repeat”

Check field counts / indicator counts Subject – 650 Ind2 = 4/7 or 5/6/8

21

'21) BLPC Step 3: Review Fields -- Examples of priority / required processing Previous examples have shown errors in subfield counts The BLPC reflect UVa-Wise Library cataloging policies and local processing needs Fields considered Priority: 001 (Title control#) 050 $a (Class#) 245 $h (Title GMD) 856 (URL) Other required processing involves: Validation report errors corrections (specific errors listed in Step 3) Adding specific fields (ex. 506) Deleting specific fields (650 non-LC subjects) Verifying/editing specific fields (007, 040)

BLPC Checklist Step 4: Review fieldsExamples of optional processing

Check field count & delete if present

029 / 583 / 584 / 938

Check field data and delete

Other vendor pkg names (netLibrary/ebrary/myiLibrary/24x7/Ebsco)

Check field data & ignore/defer

300 lacks phrase: (1 electronic resource)

22

'22) BLPC Step 4: Review fields -- Examples of optional processing Optional processing includes: Deleting junk fields before loading Deleting notes & access points that display vendor names not associated with the batch file pkg = in other words, cleaning up Provider Neutral records Also includes processing that has not been firgured out yet (ex. 300 phrase) Although the procedure was documented -- the point count for manual correction too high, So it is -- too costly Deferred until someone else can come up with a better way

Page 10: Amelia C. VanGundy The University of Virginia’s …people.uvawise.edu/.../PresentationBLPCProjNotes... · Amelia C. VanGundy The University of Virginia’s College at Wise ... Start

BLPC Checklist with mark-ups

23

'23) BLPC Checklist with mark-ups Example of the top of the Checklist after making notes & ready to count up the points Estimate 20-35 minutes to finish Checklist Use check marks, empty zeros, x-marks, asterisks Checklist Step 1.C error = videos 5 video records out of 10K -- notified vendor & identified titles for post-load maintenance project the vendor later issued new records & sent an update list for deleted titles This was a large project -- this was the fifth file (MARCEdit Split counts from zero)

BLPC Processing workflowStep 3 - Step 4

Review Field Count

Review Field data

– Use Find/Sort window and review first/last field

Add/Delete/Edit field

Review Field data

– look at field in first record or Find/Sort window

– Mistake? Typo? – use the Edit/SpecialUndo

Review FieldCount

Save edited file / SaveAs new filename 24

'24) BLPC Processing workflow The Checklist is finished & marked-up to show which fields need attention Step 1 - Step 2 should not need to be revisited Ideally the workflow begins by verifying the field first, then changing the field, and finally verifying that the change was correct The Processing guide gives detailed instructions and notes to the change/update field Which tool to use What words or phrase to use (copy/paste) The 'work' part of workflow is the Add/Delete/Edit field, every thing else is just making sure that the corrections are, indeed, correct Small changes can be saved. If the file becomes corrupted, it is easy to redo the steps BIG changes are harder to redo. Save the file with a different name before working on the field. If the file becomes corrupted, discard & return to a file saved earlier Prefer to include the step number in the new file name

Page 11: Amelia C. VanGundy The University of Virginia’s …people.uvawise.edu/.../PresentationBLPCProjNotes... · Amelia C. VanGundy The University of Virginia’s College at Wise ... Start

MARCEditor Tools window

adding/editing/deleting fields

adding/editing deleting subfields

MARCEditor Edit/Find window

editing/replacing field data

displays sortable list

MARCEdit Tools wizard

for select & extract records

extract tab-delimited records for Excel

MARCEditor / MARCEdit Tools BLPC Checklist identifies fields to process

25

'25) MARCEditor / MARCEdit Tools :: Corrected slide (Formerly: MARCEdit / Tools for processing) Tool groups MarcEditor Tool windows -- appears when the file is open MARCEditor Find/Replace windows -- appears when the file is open MARCEdit Tool wizards -- need to know the file name & a revised file is created available off the Main menu splash screen

BLPC Processing: Add std. Phrase506 => Step 3.S

• Check Field Count for presence of 506

• Delete existing 506 field (if present)

• Consult Step 3.S in BLPC Procedures

– Determine that AddField Tool is needed for processing

– Copy Std.phrase from Step 3.S notes

– Paste into AddField Tool window and submit

• Review 506 data in first record

• Check field count

• Save file26

'26) BLPC Processing: Add std. Phrase -- 506 => Step 3.S 506 field is added to all e-books that require the students to be on-campus or logged-in Existing 506 fields can be deleted since they would refer to a different library

MARCEditor Tools: Add std. Phrase506 => Step 3.S

27

'27) MARCEditor: Add std. Phrase -- 506 => Step 3.S Revised: added tool highlight Fill in: Field box Data box -- Copy/Paste from Processing Guide After the fields are added, the Tool will display a processing count -- it should match the record count Note: The quick links on the left side

BLPC Processing: Delete specific fields650 Ind2= 5/6/8 (non-LC) => Step 3.V

• Check Field Count for Presence of 650 Ind2=5/6/8

• Consult Step 3.V in BLPC Procedures

– Optional Review – FindAll(RegEx) instructions

– Determine that Tools/DeleteField tool is needed

– Copy RegEx pattern from Step 3.V

– Paste into Tools/DeleteField window

– Use Regular Expressions radio button option

– Submit using Delete button

• Check Field Count & Indicator count

• Save file28

'28) BLPC Processing: Delete specific fields 650 Ind2= 5/6/8 (non-LC) => Step 3.V Non-LC subjects with Ind2=5/6/8 are French, German, Sears The 650 Field Count details gives the value & count of each Ind2

Page 12: Amelia C. VanGundy The University of Virginia’s …people.uvawise.edu/.../PresentationBLPCProjNotes... · Amelia C. VanGundy The University of Virginia’s College at Wise ... Start

MARCEditor: Delete specific fields 650 Ind2= 5/6/8 (non-LC) => Step 3.V

29

'29) MARCEditor: Delete specific fields -- 650 Ind2= 5/6/8 (non-LC) => Step 3.V Revised: added tool highlight Fill in: Field box Data box -- Copy/Paste from Processing Guide Note: Make sure the 'Use Regular Expressions" box is checked After the fields are deleted, the Tool will display a processing count -- it should match the total Ind2 count Regular Expression patterns are not difficult, when you can copy/paste

Regular expressions (RegEx)

• Finding/Editing patterns in strings (letters/numbers)

– Like learning another language

• Parentheses are used to group data

– Forces the computer to "store" data in "chunks"

– Data “chunks” are numbered for recall/retrieval/use

– Helps the programmer "read" the pattern

• Optional functionality, and not necessary

• Some punctuation is "reserved" (has a special meaning)

• BLPC uses consistent format for RegEx patterns

30

'30) Regular expressions (RegEx) RegEx is the abbrev. used in the Checklist & Processing Guide Like any other programming language it is very picky The fields in MARCEditor are predictable (very important for computers) What a field looks like: --Field tags always start with an equal sign, followed by 2 blank spaces --The field indicators are either two numbers (or blanks, which are explicitly represented by a back-slash " \" ) --The data field always explicitly gives the subfield code. The subfield codes are represented by a dollar sign " $ " OCLC often hides the beginning $a subfield code Put a back-slash before any 'reserved' punctuation in a MARC field that will be used inside a data 'chunk" Use the 'reserved' punctuation carefully/correctly. This is the most common reason to get the message: '0 records processed' To be consistent & 'readable' the BLPC uses a lot of parentheses, more than an expert programmer would use Once you are familiar with a RegEx pattern, you can use it to create new patterns Yes, you can do it!

Page 13: Amelia C. VanGundy The University of Virginia’s …people.uvawise.edu/.../PresentationBLPCProjNotes... · Amelia C. VanGundy The University of Virginia’s College at Wise ... Start

Reading RegEx Patterns650 Ind2= 5/6/8 (non-LC)

Pattern: (=650 )(.[568])(\$a)(.+)

(=650 ) look for 650 fields with two blank spaces

(. [568]) look for any Ind1 & listed Ind2 numbers

(\$a) look for subfield $a (used as "anchor chunk")

(.+) any letter/number to the end of the field

Use Edit/FindAll(RegEx) to verify pattern

31

'31) Reading RegEx Patterns -- 650 Ind2= 5/6/8 (non-LC) This is the RegEx pattern from the previous example Pattern: (=650 )(.[568])(\$a)(.+) Each 'chunk' in parentheses covers a specific part of the field tag / indicators / subfields The 'anchor chunk' helps you find you place in the RegEx pattern 'Reserved' punctuation is used as a short-hand to summarize parts of the field If the RegEx pattern is used in the FindAll(RegEx) box & finds the appropriate records, it can be used in any of the MARCEditor tools

Interpreting RegEx punctuation

Pattern: (=650 )(.[568])(\$a)(.+)

( ) Parentheses for data “chunks”

. Period for any single letter/number

[ ] Square brackets for a list using “OR”

\ Backslash before “reserved” punctuation

esp.: $ \ ( ) [ ]

+ Plus sign for more of the same

“Chunks” are stored as: $1$2$3$432

'32) Interpreting RegEx punctuation Pattern: (=650 )(.[568])(\$a)(.+) Knowing what the data 'chunks' cover -- tag / indicators / subfields -- makes interpreting the RegEx punctuation easier Again. the RegEx pattern looks for : 650 Ind2= 5/6/8 (non-LC subjects) Note the use of 'reserved' punctuation in the anchor 'chunk' -- a dollar sign '$' -- Use the backslash to indicate that the '$' is the MARC subfield symbol When the computer processes the RegEx pattern, it 'stores' each 'chunk' in a numbered compartment This example has four 'chunks' / four compartments -- $1$2$3$4

Creating RegEx patterns

• Start with known pattern:For non-LC Subjects: (=650 )(.[568])(\$a)(.+)

FindAll(RegEx) for “local” Subjects (Ind2 = 4/7)

(=650 )(.[47])(\$a)(.+)

FindAll(RegEx) for “local” Genres (Ind2 = 4/7)

(=655 )(.[47])(\$a)(.+)

33

'33) Creating RegEx patterns Consistent RegEx patterns, enables small changes to create very different searches First example changes the Ind2 selections Ind2 = 4 for local subjects Ind2 = 7 for specified subject thesaurus (thesaurus code in $2) Second example keeps the 'local' indicators, but changes the tag to

Page 14: Amelia C. VanGundy The University of Virginia’s …people.uvawise.edu/.../PresentationBLPCProjNotes... · Amelia C. VanGundy The University of Virginia’s College at Wise ... Start

a 'genre' subject -- 655 Again, notice the 'anchor chunk', that shows that the changes occur in the tag / indicator 'chunks' Workflow: Working from a known pattern Create a new RegEx pattern Test the pattern in FindAll(RegEx) Use the same pattern in the MARCEditor tools to Delete the field

Editing with RegEx string pattern650 BISAC subjects => 690

Start with known pattern: (=650 )(.[568])(\$a)(.+)

• Use Edit/Replace(RegEx): Change 650 to 690

Identify “BISAC” subjects: Ind2=7 & $2 = bisacsh

• Determine which “chunks” change/stay the same

Find(RegEx): (=650 )(.[7])(\$a)(.+)(\$2bisacsh)

Replace(RegEx): (=690 )$2$3$4$5

34

'34) Editing with RegEx string pattern -- 650 BISAC subjects => 690 Revised: Corrected typo in Indicator 'chunk' Creating a more complex RegEx pattern Temporary solution: retain 'BISAC' subjects, but move them to a local subject -- 690 field :: searchable, but excluded from the in-house Authority checking reports 'BISAC' subjects are specifically coded -- Ind2 & $2 subfield -- making them easy to process Only the first “chunk” is changed. Make sure that you type it correctly -- equal sign, tag#, two blank spaces All the other 'chunks' are retrieved from storage in the original order

Reading RegEx Patterns650 BISAC subjects => 690

Pattern: (=650 )(.[7])(\$a)(.+)(\$2bisacsh)

(=650 ) look for 650 fields with two blank spaces

(.[7]) look for any Ind1 & Ind2 =7

(\$a) look for subfield $a (optional “anchor” text)

(.+) any letter/number to the next “chunk”

(\$2bisacsh) look for subfield & data at end of field

Can be shortened (which makes the pattern look complicated):

Find(RegEx): (=650)(.+\$2bisacsh)

Replace(RegEx): (=690)$2

35

'35) Reading RegEx Patterns -- 650 BISAC subjects => 690 Revised: Corrected typo in Indicator 'chunk' Pattern: (=650 )(.[7])(\$a)(.+)(\$2bisacsh) Remember to back-slah the 'reserved' punctuation for the MARC subfield symbol Since the list of Ind2 is only one number, you can omit the 'list' -- the square brackets Simplified: (.7)

Page 15: Amelia C. VanGundy The University of Virginia’s …people.uvawise.edu/.../PresentationBLPCProjNotes... · Amelia C. VanGundy The University of Virginia’s College at Wise ... Start

BLPC uses consistent patterns to make it easier to create & rea But the RegEx pattern can be further shortened, but it is much harder to read Experienced programmers prefer the most compact RegEx pattern For ex. (=650)(.+\$2bisacsh) this pattern also ignores the Ind2 value For MARCEdit, using more 'chunks' is not a problem for the computer When you get a very compact pattern, rewrite it to show the tag / indicator groups, the 'anchor chunk', and search string data

MARCEditor: FindAll(RegEx)Testing the pattern: 650 BISAC subjects

36

' 36 ) MARCEditor: FindAll(RegEx) -- Testing the pattern: 650 BISAC subjects (New Slide) Pattern: (=650 )(.[7])(\$a)(.+)(\$2bisacsh) Remember to select the 'Use regular expressions' check-box Be careful copying/pasting, make sure that extra blank spaces do not appear at the beginning or end of the pattern The FindAll Results window displays the number of fields retrieved Keep track of the FindAll field count, when deleting fields the counts should match Click on the column header 'Found Text' to sort by the field data & to scroll through the data Fields with odd information are often found at either the beginning or the end of the sorted list

MARCEditor: Replace(RegEx)650 BISAC subjects => 690

37

'37) MARCEditor: Replace(RegEx) -- 650 BISAC subjects => 690 Revised: Corrected typo in Indicator 'chunk' Pattern: (=650 )(.[7])(\$a)(.+)(\$2bisacsh) Remember to select the 'Use regular expressions' check-box Be careful copying/pasting, make sure that extra blank spaces do not appear at the beginning or end of the pattern Drawback: the field remains in the 'same place' in the record (with

Page 16: Amelia C. VanGundy The University of Virginia’s …people.uvawise.edu/.../PresentationBLPCProjNotes... · Amelia C. VanGundy The University of Virginia’s College at Wise ... Start

the other 650 fields) This is basically is the same as the 'Swap Field Utility' Use the Replace(RegEx) if you need to re-order the subfields by changing the order of the 'chunks' Ex. If (for some non-cataloging ) cataloging reason the 'bisacsh' subfield needs to be the 'first' subfield -- use: Repl(RegEx): (=690 )$2$5$3$4 (tag)(ind)('bisacsh' subfield)(anchor 'chunk')(anything to end)

BLPC Step 5: 949 processingRequired processing

Policy: Include Class# in Unicorn Item record

949

$a -- Pull the call# from the 050$a

-- Insert the standard phrase: ' INTERNET'

$v -- Pull the 001/OCLC# as a unique no.

$w $h $t $x $z -- Add standard holdings data

• See Addt'l instruct,

38

'38) BLPC Step 5: 949 processing The class# in the Item record used for gathering statistics by call# range and for the call# browse Processing the 949 holdings is required and time consuming (As a later project, I want to be able to use the Tasks function to streamline the process) The value-added for this step, makes the high point count worth it

Batch-loading• MARCEdit with files no larger than 10k records

– MARCEdit/Tool MARCSplit

• MARCEditor/File: Compile File into MARC

• Unicorn batch load rpt uses 001 match point– 'o' for OCLC# o & 'g' for local vendor key

• Unicorn batch load rpt settings– create new bibliographic records only

• Date cataloged -- back dated to prev. month– prevents interference w/scheduled Authority reports

– max. load two files a day 39

'39) Batch-loading Revision :: Corrected typo in 'MARCSplit' MARCEditor can work with large files (max. tested: 67k records), but there is a slow-down when changing record data :: smaller files (10k records) work faster Date cataloged is backdated to a non-working date If there is unacceptable data (typos) in the records, they can be purged by cataloged date The 'Date Created' in the bibl record control tab displays the actual load date The Checklist includes an area to track the load dates & statistics

Page 17: Amelia C. VanGundy The University of Virginia’s …people.uvawise.edu/.../PresentationBLPCProjNotes... · Amelia C. VanGundy The University of Virginia’s College at Wise ... Start

Identifying records for Cleanup

Checklist finds problems to correct post-load

• Item maintenance projects

– 949 lacks call#

• Bibliographic record maintenance projects

– 245 lacks $h (if more than 5-12 records)

– URLs lacking

• Record reload/overlay project

– Record already in OPAC (P-N duplicates)

40

'40) Identifying records for Cleanup -- Required post-load maintenance projects Remember -- BLPC designed to load files as quickly as possible & identify problems to correct later BLPC Checklist aligned with the UVA-Wise. Library cataloging priorities -- cleanup is limited to only critical errors/problems Just as most of the processing time is spent on Required processing-- The same fields take more time in cleanup: Validation Reports errors URL links 949 Batch-load errors Provider neutral records (edited to display data for a specific vendor) may already exist in the catalog The batch-load records will not load/overlay & will give a 'Flex-key already exists' error message Procedures need to be developed to resolve multiple P-N records from different vendors Also need to develop policy/procedure for: 300 fields which lack the phrase: (1 electronic resource) RDA 38x fields

MARCEdit Tools: Select/Extract selected records

Step 3.F: 245 lacks $h

41

'41) MARCEdit Tools: -- Select/Extract records -- 245 lacks $h Revision: added processing order & search results count window MARCEdit Tool wizard: Tools / Select MARC records / Extract selected records Checklist Step 3.F was marked-up to note that one record lacked a 245$h •Load the MARC (.mrc) file (can also load MARCEditor (.mrk) files) 2) Be sure to change the display field to 245$h 3) Import the file (245$h data will fill the display window) Click on the column header 'Display Field' to sort by the field

Page 18: Amelia C. VanGundy The University of Virginia’s …people.uvawise.edu/.../PresentationBLPCProjNotes... · Amelia C. VanGundy The University of Virginia’s College at Wise ... Start

data & to scroll through the data 4) After the search term is entered, clicking on the 'search' icon displays the count Note: fields/subfields that lack data, have message: 'Display field not found' 5) Export the selected records to a new MARCEditor (.mrk) file There is a prompt to delete them from the existing file -- Do not delete, keep them in the existing file The number of records exported/extracted should match the Search count. Review the new file of problem 245$h records in MARCEditor to identify the problem title & correct manually in the original file Note: To change the data display window, change the Display field option (choice 2) & import again (choice 3)

MARCEdit Tools: Export Tab Delimited records

42

' 42) MARCEdit Tools: -- Export Tab Delimited records (New Slide) If there are too many errors to correct manually, select/extract the problem records to a new file & export to a tab-delimited file that can be opened directly from Excel -- 1) Identify the MARC (.mrc) file (can also load MARCEditor (.mrk) files) 2) Name the export file -- prefer to export as .csv file (comma-separated values) 4) Determine the fields to Export and Add them to the list (only the field tags in the list window will export) 5) Export Next: In Excel, open the .csv file, & view/print as a spreadsheet Other helpful MARCEdit Tools: (from either the Tool menu or the sidebar) MARCSplit / MARCJoin

Page 19: Amelia C. VanGundy The University of Virginia’s …people.uvawise.edu/.../PresentationBLPCProjNotes... · Amelia C. VanGundy The University of Virginia’s College at Wise ... Start

Help!• MarcEdit Help

http://people.oregonstate.edu/~reeset/marcedit/html/help.html

– Click thru the Contents menu:

Contents / Using MARCEdit / Using the MARCEditor / Editing Functions / Using Regular Expressions.

• RegularExpressions.info

http://www.regular-expressions.info/

MARCEDIT-L list

http://metis3.gmu.edu/cgi-bin/wa?A0=MARCEDIT-L

BATCH list

http://listserv.vt.edu/cgi-bin/wa?A0=batch43

'43) Help! Also consult the many YouTube video clips Tutorials by T.Reese & other MARCEdit supporters MARC record help: Library of Congress MARC standards http://www.loc.gov/marc/ OCLC Bibliographic Records Formats and Standards http://www.oclc.org/bibformats/

Amelia C. VanGundyThe University of Virginia's College at Wise

John Cook Wyllie Library

[email protected]

http://people.uvawise.edu/acv6d/

Virginia SirsiDynix Library Users Group MeetingNov. 14, 2012

44

'44) Thanks Working on the BLPC project made me learn a lot of new things & reinforced that there is more for me to learn & I had fun

BLPC ProjectPresentation revisions

Originally presented Nov. 14, 2012

• Additional Slides:

– BLCP Project web-page

– MARCEditor: FindAll(RegEx)

– MARCEdit Tools: Export Tab Delimited records

– BLPC Project: Presentation revisions

45

' 45 ) BLPC Project -- Presentation revisions (New Slide) Additional Slides: BLCP Project web-page MARCEditor: FindAll(RegEx) MARCEdit Tools: Export Tab Delimited records BLPC Project: Presentation revisions Slide Revisions: BLPC Introduction & Outcomes MARCEdit / Tools for processing now : MARCEditor / MARCEdit Tools MARCEditor Tools: Add std. Phrase (minor) MARCEditor: Delete specific fields (minor) Editing with RegEx string pattern Reading RegEx Patterns MARCEditor: Replace(RegEx) Batch-loading (minor) MARCEdit Tools: Select/Extract selected records