60
WAS to Archive-It Metadata Migration March 11, 2015

WAS to Archive-It Metadata Migration March 11, 2015

Embed Size (px)

Citation preview

Page 1: WAS to Archive-It Metadata Migration March 11, 2015

WAS to Archive-It Metadata Migration

March 11, 2015

Page 2: WAS to Archive-It Metadata Migration March 11, 2015

WAS -> Archive-It

WASProject/Archive

• 3 levels of hierarchy– Project– Site (can contain 1 or more Seed URLs)

– Seed URL

Archive-It Collection

• 2 levels of hierarchy– Collection– Seed URL

Page 3: WAS to Archive-It Metadata Migration March 11, 2015

2 Seed URLs per Site

1 Seed URL per Site

1 Seed URL per Site

1 Seed URL per Site

2 Seed URLs per Site

Page 4: WAS to Archive-It Metadata Migration March 11, 2015

Multiple seeds – flattens out; each Seed URL gets all the Site Metadata

Page 5: WAS to Archive-It Metadata Migration March 11, 2015

BEFORE starting, you should…

Delete sites (seeds) that you have never captured or you captured, but you deleted all the captures. Probably sitting under ‘never captured’ or ‘inactive sites’

Page 6: WAS to Archive-It Metadata Migration March 11, 2015

How to move

• Move project (collection) by project (collection).

• When you sit down, start and finish the move of a project.

• You don’t have to do all projects/collections in one day

Page 7: WAS to Archive-It Metadata Migration March 11, 2015

Run two reports (Administration > Project Admin)1. Click “Archive-It Seed Export” > Export Seeds2. Click “Archive-It Seed Metadata Export > export metadata

Coming Soon in your accounts

Page 8: WAS to Archive-It Metadata Migration March 11, 2015

Export Seeds

Page 9: WAS to Archive-It Metadata Migration March 11, 2015

Seeds export from WAS

• It is in .txt format, open it with notepad • Your seeds will be segmented by crawl

frequency. • E.g., “Seeds with custom schedule of 1x per

year”• You will copy and paste URLS from the .txt

document and upload them in chunks by frequency

Page 10: WAS to Archive-It Metadata Migration March 11, 2015

Example text file

Page 11: WAS to Archive-It Metadata Migration March 11, 2015

Consult the WAS- Archive-it mapping document to decide on the equivalent frequency

https://wiki.library.ucsf.edu/download/attachments/351243364/MappingofWAStoArchive-It.pdf?version=1&modificationDate=1422304077000&api=v2

Page 12: WAS to Archive-It Metadata Migration March 11, 2015

Create a Collection (project) in Archive-It

Page 13: WAS to Archive-It Metadata Migration March 11, 2015

Create Collection (aka Project)

Page 14: WAS to Archive-It Metadata Migration March 11, 2015

Select frequency: for now leave at “One-Time’, click next

Page 15: WAS to Archive-It Metadata Migration March 11, 2015

Enter Collection level metadata. This metadata displays in the public site. You can go back and

fully enter this later

Page 16: WAS to Archive-It Metadata Migration March 11, 2015

Topics will appear in public site(along with any Subjects you have)

Page 17: WAS to Archive-It Metadata Migration March 11, 2015

Example display

Page 18: WAS to Archive-It Metadata Migration March 11, 2015

In order to create a collection, you must upload seeds.

Page 19: WAS to Archive-It Metadata Migration March 11, 2015

If you have Historical seeds, Upload those FIRST(!)

• Historical sites/seeds are seeds where the seed URL has changed over the life of the captures.

• They will be at the top of your seeds .txt document• Do these first because it is easiest to do a ‘bulk edit”

and select ‘deactivate”

Page 20: WAS to Archive-It Metadata Migration March 11, 2015

Example seeds list with Historical Seeds

Page 21: WAS to Archive-It Metadata Migration March 11, 2015

Copy and paste seeds from .txt fie into box. Leave ‘Default’ selected > Next

Page 22: WAS to Archive-It Metadata Migration March 11, 2015

VERY important:1. Ignore this error for ALL your seed uploads. 2. “URL is correct; use as is” MUST be checked regardless of the error you see. If it is not selected for any seeds, go thru now and change it for all instances.

Page 23: WAS to Archive-It Metadata Migration March 11, 2015

Another example, click: “URL is correct; use as if” for all

Page 24: WAS to Archive-It Metadata Migration March 11, 2015

Collection created

Page 25: WAS to Archive-It Metadata Migration March 11, 2015

Bulk Edit Historical Seeds (where applicable)

Page 26: WAS to Archive-It Metadata Migration March 11, 2015

Under “Seed Management” click “All”

Page 27: WAS to Archive-It Metadata Migration March 11, 2015

Click top box to select all. Note: you will ‘select all’ for what is displayed, if there are more than 400 items,

they are on another page. You will have to repeat

Page 28: WAS to Archive-It Metadata Migration March 11, 2015

Click “bulk edit”

Page 29: WAS to Archive-It Metadata Migration March 11, 2015

Choose “Deactivate”

Page 30: WAS to Archive-It Metadata Migration March 11, 2015

Go back to bulk edit > Add Metadata

• Suggestion: add a Notes field if you don’t already have one, where you note that these are historical seeds. Most likely will never want to crawl these again so you may want to keep track

Page 31: WAS to Archive-It Metadata Migration March 11, 2015
Page 32: WAS to Archive-It Metadata Migration March 11, 2015

Add a custom field

Page 33: WAS to Archive-It Metadata Migration March 11, 2015

Go back Collection management and repeat for the next frequency in your seed list

Page 34: WAS to Archive-It Metadata Migration March 11, 2015

Back to Seeds .txt file

Leave as ‘one-time” they will not crawl until you say crawl now

Page 35: WAS to Archive-It Metadata Migration March 11, 2015

Copy and paste seeds into box. Leave ‘Default’ selected > Next

Page 36: WAS to Archive-It Metadata Migration March 11, 2015

For this case, choose Quarterly

Page 37: WAS to Archive-It Metadata Migration March 11, 2015

Import metadata

Page 38: WAS to Archive-It Metadata Migration March 11, 2015

Click “ALL seeds” > Import metadata

Page 39: WAS to Archive-It Metadata Migration March 11, 2015

Upload the metadata file > Upload File (leave default setting)

Page 40: WAS to Archive-It Metadata Migration March 11, 2015

You could stop here and do the clean up at a later day

Page 41: WAS to Archive-It Metadata Migration March 11, 2015

Metadata cleanup

• If there is a WAS field that is not in Archive-it, on import Archive-it creates a custom field.

• All fields will display in the public interface by default

• The following fields may be in your upload, but they should ALL be made private:Note, Scope, Robots honored, Max crawl seconds, Capture frequency, Seed type, Site ID

Page 42: WAS to Archive-It Metadata Migration March 11, 2015

How to make fields private in Archive-it:

1. Go to Admin (link in the upper right corner)2. Account Settings 3. In the text box toward the bottom of the page

called 'Private Metadata Fields' enter all these fields: Note , Scope, Robots honored, Max crawl seconds, Capture frequency, Seed type, Site ID

4. NB: Enter each field name on a separate line, in all lower case letters.

Page 43: WAS to Archive-It Metadata Migration March 11, 2015
Page 44: WAS to Archive-It Metadata Migration March 11, 2015
Page 45: WAS to Archive-It Metadata Migration March 11, 2015

Scope –> Seed Type

• What about Directory only?• What about Page only?

Page 46: WAS to Archive-It Metadata Migration March 11, 2015

NB. Archive-it offers a lot of additional scoping options for crawls. View: Help Documentation (linked top, right of collection page)

Page 47: WAS to Archive-It Metadata Migration March 11, 2015

Directory is not a separate scoping option in Archive-it ( it is handled through slash - /)

NO action need by you, except to QA

WAS – Directory crawls

• Rosalie.com/presentations– We will add the ending slash for you if you didn’t

• Rosalie.com/presentations/– It moves over as is

• Rosalie.com/presentations.html– It will crawl as host

Page 48: WAS to Archive-It Metadata Migration March 11, 2015

What about ‘page only’ crawls?

• For ‘Page only’ you will have to manually go back and change crawl scope (seed type)

• You can find these by opening the metadata export. It is in .ods format, which you can open in Google docs, with most versions of excel or download open office.

• Do NOT edit the .ods file before doing the metadata upload; make a copy.

• Then sort “scope” column to find the relevant URLsHow to change it:• Page: click on Settings > Crawl one page only (can also be

bulk edited)

Page 49: WAS to Archive-It Metadata Migration March 11, 2015

Change Frequency under Settings > Seed Type

Page 50: WAS to Archive-It Metadata Migration March 11, 2015

When will my crawls start? When you start them.

Page 51: WAS to Archive-It Metadata Migration March 11, 2015

When do I shut off WAS crawls?

• FIRST set up your crawls in Archive-It• Make sure daily crawls are running• Then you can stop your WAS crawls

Page 52: WAS to Archive-It Metadata Migration March 11, 2015

VERY important: Do NOT make any edits to WAS data, crawls, ANYTHING once you have moved a project to Archive-It!

Page 53: WAS to Archive-It Metadata Migration March 11, 2015

Batch shut off crawling in WAS

Page 54: WAS to Archive-It Metadata Migration March 11, 2015

Sites > Manage Sites > “all” > “select all” > “Reschedule Selected”

Page 55: WAS to Archive-It Metadata Migration March 11, 2015

Select “off” and click “Reschedule”

Page 56: WAS to Archive-It Metadata Migration March 11, 2015

Send CDL your info

Page 57: WAS to Archive-It Metadata Migration March 11, 2015

After you have created all your collections,

1. Send Rosalie this info for each collectiona) Collectionidb) Accountid

AND

2. Add Rosalie as a user to your account (for now)

Page 58: WAS to Archive-It Metadata Migration March 11, 2015

CollectionId and AccountId in URL

Page 59: WAS to Archive-It Metadata Migration March 11, 2015

Where’s my data?

• Archive-It will work with CDL staff to move over your data.

• Timeline: May/June 2015

Page 60: WAS to Archive-It Metadata Migration March 11, 2015

Resources

WAS – Archive-It Migration wiki: https://wiki.library.ucsf.edu/display/UCLCKG/WAS+-%3E+Archive-it+Migration

Mapping of terms and metadata: WAS - Archive-It:https://wiki.library.ucsf.edu/download/attachments/351243364/MappingofWAStoArchive-It.pdf?version=1&modificationDate=1422304077000&api=v2