49
Digital Library: The Digital Library: The HKU Libraries’ HKU Libraries’ experiences experiences Kam-ming Ku Kam-ming Ku HKUL HKUL [email protected] [email protected]

Digital Library: The HKU Libraries ’ experiences Kam-ming Ku [email protected]

Embed Size (px)

Citation preview

Digital Library: The Digital Library: The HKU Libraries’ HKU Libraries’ experiencesexperiences

Kam-ming KuKam-ming Ku

HKUL HKUL

[email protected]@hku.hk

The presentation is about:The presentation is about:

How to achieve delivering right How to achieve delivering right information to the right person at the information to the right person at the right time in anywhere?right time in anywhere?

1.1.HKUL resources/projectsHKUL resources/projects2.2.Going to do…Going to do…3.3.ChallengesChallenges4.4.Overcome the challengesOvercome the challenges5.5.DiscussionDiscussion

1. HKUL resources/projects1. HKUL resources/projects

1.1 Staffing1.1 Staffing

1.2 Networking1.2 Networking

1.3 Hardware1.3 Hardware

1.4 Software 1.4 Software

1.5 DL initiatives1.5 DL initiatives

1.1 Systems Staff1.1 Systems Staff

• Systems LibrarianSystems Librarian

• 2 Computer Officers2 Computer Officers

• Assistant LibrarianAssistant Librarian

• Assistant Computer OfficerAssistant Computer Officer

• Senior Library AssistantSenior Library Assistant

• 5.5 Technicians5.5 Technicians

1.2 Networking1.2 Networking

• From 10 From 10 100 100 1000 1000 wireless wireless Bluetooth??Bluetooth??

• Gigabit Ethernet backbone and Fast Ethernet Gigabit Ethernet backbone and Fast Ethernet running to users. About 1000 network points.running to users. About 1000 network points.

• ACENet connection (Access Everywhere ACENet connection (Access Everywhere Network; plug-in network for roaming users); Network; plug-in network for roaming users); ~450 fixed points; 18 wireless access points.~450 fixed points; 18 wireless access points.

1.2 Networking (cont.)1.2 Networking (cont.)

• Libraries within Campus are connected to Libraries within Campus are connected to Campus Backbone by Gigabit Ethernet link Campus Backbone by Gigabit Ethernet link or Fast Ethernet link.or Fast Ethernet link.

• 2 remote sites, Dental & Medical Libraries, 2 remote sites, Dental & Medical Libraries, are connected to Main Campus by 10Mbps are connected to Main Campus by 10Mbps links links respectively.respectively.

• Gigabit Firewall (Cisco PIX Firewall)Gigabit Firewall (Cisco PIX Firewall)

• Packeteer Network shaperPacketeer Network shaper

1.3 Hardware1.3 Hardware

• Compaq AlphaServer GS60E ( for Compaq AlphaServer GS60E ( for library catalogue)library catalogue)

• SUN Enterprise 4000, 10000SUN Enterprise 4000, 10000

• 3 Linux, 5 Windows and 3 Novell 3 Linux, 5 Windows and 3 Novell ServersServers

1.3 Hardware (cont.)1.3 Hardware (cont.)

• 10 CDROM Towers 10 CDROM Towers 4 Towers for Staff4 Towers for Staff2 Towers in Medical Library2 Towers in Medical Library4 Towers for Network4 Towers for Network

• 3 WinFrame Servers & 1 Thin Client server3 WinFrame Servers & 1 Thin Client server1 Network CD-ROM MetaFrame Server1 Network CD-ROM MetaFrame Server1 Standalone CD-ROM MetaFrame Server1 Standalone CD-ROM MetaFrame Server1 Network CD-ROM WinFrame Server1 Network CD-ROM WinFrame Server1 Dell Server for 6 Thin Clients1 Dell Server for 6 Thin Clients

1.3 Hardware (cont.)1.3 Hardware (cont.)

PCPC MACMAC

Office/StaffOffice/Staff 289289 66

CounterCounter 3535

StudentStudent 342342 77

PrinterPrinter ScannerScanner

Office/StaffOffice/Staff 107107 1717

StudentStudent 2727 1212

1.4 Software1.4 Software

• SUN Solaris 8, DEC UNIX, Windows 2000/NT, SUN Solaris 8, DEC UNIX, Windows 2000/NT, Novell Netware, LinuxNovell Netware, Linux

• III Innopac library management system III Innopac library management system

• Oracle 9i database, 9iAS (Web) and Context Oracle 9i database, 9iAS (Web) and Context (full-text indexing/searching)(full-text indexing/searching)

• ERL server for SilverPlatter databasesERL server for SilverPlatter databases

• WinFrame server for legacy and network WinFrame server for legacy and network CDROM databasesCDROM databases

• Apache Web serversApache Web servers

1.4 Software (cont.)1.4 Software (cont.)

• TRS 4.0 serverTRS 4.0 server

• CJN server for hosting 6000+ China full-CJN server for hosting 6000+ China full-text journals text journals

• Proxy server, Samba serverProxy server, Samba server

• Pcounter serverPcounter server

• Tamino XML serverTamino XML server

• VOD server (IBM Videocharger)VOD server (IBM Videocharger)

• Ezproxy ServerEzproxy Server

1.4 Software (cont.)1.4 Software (cont.)

Illiad server (Inter-library Loan)Illiad server (Inter-library Loan)Taiwan Newspaper databaseTaiwan Newspaper databaseChinese Database Server: Chinese Database Server:

Sibucongkan (Sibucongkan ( 四部叢刊四部叢刊 ); ); Sikuquanshu Sikuquanshu (( 四庫全書四庫全書 ); ); ekangxi dictionary (ekangxi dictionary ( 康熙字康熙字典典 ) )

1.5 HKUL DL initiatives1.5 HKUL DL initiatives

1.5 HKUL DL initiatives1.5 HKUL DL initiativesImaging database

1.5 HKUL DL initiatives1.5 HKUL DL initiatives

• 1.5.1. Digitalization projects1.5.1. Digitalization projects

e.g. ExamBasee.g. ExamBase– First in-house developed databaseFirst in-house developed database– Imaging database for past exam. papersImaging database for past exam. papers– Released in 1996Released in 1996– Use DMS, client-server modelUse DMS, client-server model– Shifted to web-based soonShifted to web-based soon– tiff only (on-the-fly convert to gif/jpg) , no tiff only (on-the-fly convert to gif/jpg) , no

PDF!!!PDF!!!

1.1. HardwareHardware High-speed flat bed scanner (36ppm)High-speed flat bed scanner (36ppm)

2.2. SoftwareSoftware Kofax capture 3.0Kofax capture 3.0

Sophisticated software includes scanning, OCR, Sophisticated software includes scanning, OCR, verifications.verifications.

3.3. LogisticsLogistics

a.a. ScanningScanning

b.b. Automatic indexingAutomatic indexing

c.c. Verification and manual inputtingVerification and manual inputting

d.d. Data Publishing Data Publishing Publish data to Oracle database Publish data to Oracle database

a.a. ScanningScanning PPapers are scanned in batch mode (~200 pages per apers are scanned in batch mode (~200 pages per

batch)batch) Uses separation sheet to separate different Uses separation sheet to separate different

documentsdocuments ((The separation sheet is printed with barcoded index The separation sheet is printed with barcoded index (e.g. department, course code) and fixed-sized font (e.g. department, course code) and fixed-sized font texttext The separation sheets can be re-usedThe separation sheets can be re-used.).)

b.b. Automatic indexing Automatic indexing To recognize those barcoded indexes and text To recognize those barcoded indexes and text

printed on the separation sheetprinted on the separation sheet

c.c. Verification and manual inputtingVerification and manual inputting No need to verify the barcoded indexes, as No need to verify the barcoded indexes, as

the accuracy is > 99.999% the accuracy is > 99.999% In-doubt OCRed text is marked in red, it is In-doubt OCRed text is marked in red, it is

easy to verify easy to verify Input other indexes manually (e.g. exam. Input other indexes manually (e.g. exam.

date)date)

1.5 HKUL DL initiatives1.5 HKUL DL initiatives (cont.)(cont.)

• e.g. Newspaper clippingse.g. Newspaper clippings– Full-text imaging databaseFull-text imaging database– Outsource: scanning/indexing/OCR Outsource: scanning/indexing/OCR – Oracle context cartridge as full-text Oracle context cartridge as full-text

search engine (supports no Chinese!)search engine (supports no Chinese!)– Decision: keep on using? or buying a 3-Decision: keep on using? or buying a 3-

rd party full-text software??rd party full-text software??

1.5 HKUL DL initiatives1.5 HKUL DL initiatives (cont.)(cont.)

• 1.5.2 Value-added Bibliographic 1.5.2 Value-added Bibliographic databasesdatabases– Subset of library catalogueSubset of library catalogue– e.g. TOC , Thesis Online, AV materials..e.g. TOC , Thesis Online, AV materials..– Debate: Debate:

•single point source or a number of subsets??single point source or a number of subsets??

1.5 HKUL DL initiatives1.5 HKUL DL initiatives (cont.)(cont.)

e.g. Table of Contentse.g. Table of Contents• To automate the inputting of TOC To automate the inputting of TOC

into bibliographic recordsinto bibliographic records

1.1. HardwareHardware Overhead book scanner (~4sec per image) Overhead book scanner (~4sec per image)

2.2. SoftwareSoftware Kofax capture 3.0Kofax capture 3.0

Sophisticated software includes scanning, OCR, Sophisticated software includes scanning, OCR, verifications.verifications.

3.3. TechniquesTechniques

a.a. ScanningScanning

b.b. Chinese OCR Chinese OCR

c.c. ProofreadingProofreading

d.d. Data Publishing Data Publishing Publish data to Catalogue Publish data to Catalogue

a.a. ScanningScanning Use book scanner to scan the book’s TOCUse book scanner to scan the book’s TOC benefitsbenefits ::

no need to flip the book for scanningno need to flip the book for scanning can scan two sides at one timecan scan two sides at one time increase the speed of scanningincrease the speed of scanning

b.b. Chinese OCRChinese OCR

c.c. A plug-in module was written to interface with A plug-in module was written to interface with Kofax Capture for Chinese OCR (TH-OCR 7.5)Kofax Capture for Chinese OCR (TH-OCR 7.5)

c.c. ProofreadingProofreading

Use MS Word (Chinese) to do the proofreadingUse MS Word (Chinese) to do the proofreading Macro program was written to ease the step of Macro program was written to ease the step of

assigning MARC sub-fieldsassigning MARC sub-fields

d.d. Publish data to CataloguePublish data to Catalogue

Done at night in batch modeDone at night in batch mode Use tcl/tk expect script to automate the upload Use tcl/tk expect script to automate the upload

processprocess

1.5 HKUL DL initiatives1.5 HKUL DL initiatives (cont.)(cont.)

• 1.5.3 Subject-based e-resources1.5.3 Subject-based e-resources– Redesign tag 996 Redesign tag 996 – A number of useful information on e-resourcesA number of useful information on e-resources– Grouping of materials by subject: fulfill users’ Grouping of materials by subject: fulfill users’

needsneeds– Ease of extending our further DL projects (e.g. Ease of extending our further DL projects (e.g.

portal)portal)– See HKUL HP (databases, EJ, Ebooks & ENews)See HKUL HP (databases, EJ, Ebooks & ENews)

• 1.5.4 Internet resources1.5.4 Internet resources

• 1.5.5 Electronic Delivery (ILLiad)1.5.5 Electronic Delivery (ILLiad)

1.5 HKUL DL initiatives1.5 HKUL DL initiatives (cont.)(cont.)• 1.5.6 Virtual services1.5.6 Virtual services

– E-forms (e.g. BRO)E-forms (e.g. BRO)– Online referenceOnline reference

• 1.5.7 Automation1.5.7 Automation

– Increase efficiencyIncrease efficiency– e.g. amend thousand of records in batche.g. amend thousand of records in batch– Electronic submissionElectronic submission– Staff intranetStaff intranet– InnofaceInnoface

1.5 HKUL DL initiatives1.5 HKUL DL initiatives (cont.)(cont.)• 1.5.8 Collaboration 1.5.8 Collaboration

– Union catalogue w/ Jinan UniversityUnion catalogue w/ Jinan University

• 1.5.9 Authentication : Proxy, ezproxy, 1.5.9 Authentication : Proxy, ezproxy, IP controlIP control

• 1.5.10 Others…: for accessing legacy 1.5.10 Others…: for accessing legacy CDROM databasesCDROM databases

2. Going to do…2. Going to do…

1.1. SStorage Area Network (torage Area Network (SAN)SAN)2.2. Abundance of serversAbundance of servers3.3. One-stop searchOne-stop search4.4. Alert serviceAlert service5.5. Wireless applicationsWireless applications

2.1 SAN2.1 SAN

Problem a: StorageProblem a: Storagelarge data size of our hosted large data size of our hosted

databasesdatabaseshigh monthly data increase rate high monthly data increase rate Databases are hosted in different Databases are hosted in different

hosts/OShosts/OS

2.1 SAN (cont.)2.1 SAN (cont.)

Problem b: BackupProblem b: Backupbackup drive for every machine backup drive for every machine backup software license for every backup software license for every

machine machine Need to handle a lot of backup tapesNeed to handle a lot of backup tapes

2.1 SAN (cont.)2.1 SAN (cont.)

Solution Solution – – (SAN)(SAN) Put all data storage into a single largePut all data storage into a single large-sized -sized

expandable expandable storage device. storage device. The storage device is connected to the The storage device is connected to the

hosts by high-speed Fiber channelshosts by high-speed Fiber channelsFiber channel loop is used to connect to Fiber channel loop is used to connect to

each host each host in order in order to ensure highto ensure high availabilityavailability

Backup can be done on a single deviceBackup can be done on a single device

2.2 2.2 Abundance of serversAbundance of servers

Problem :Problem :Hard to monitor the status and Hard to monitor the status and

activities of each serveractivities of each serverWaste time to tune the performance Waste time to tune the performance

of each serverof each server

2.2 2.2 Abundance of servers Abundance of servers (cont.)(cont.)

Solution Solution – – Server consolidationServer consolidationBuy several powerful servers instead of Buy several powerful servers instead of

many cheap mid-range serversmany cheap mid-range servers Keep as minimal servers as possible Keep as minimal servers as possible Save space and UPS power ratings , i.e. $$ Save space and UPS power ratings , i.e. $$

savingsaving

Save man power to administer/maintain Save man power to administer/maintain server performance , i.e. cost savingserver performance , i.e. cost saving

2.3 One-stop search2.3 One-stop search

Before searching, one needs to know Before searching, one needs to know which database suit one’s need which database suit one’s need

To search multiple databases To search multiple databases simultaneouslysimultaneously e.g. OAI e.g. OAI ((http://www.openarchives.org/http://www.openarchives.org/ ) ) e.g. CDL SearchLight (e.g. CDL SearchLight (http://http://

www.cdlib.org/cgiwww.cdlib.org/cgi-bin/searchlight-bin/searchlight))

2.4 Alert service2.4 Alert service

To alert users for new informationTo alert users for new informationSDISDI

2.5 Wireless Application2.5 Wireless Application

A study on mobile and PDA application in A study on mobile and PDA application in LibraryLibrary

3. Challenges3. Challenges

• ChangesChanges

• New TechnologiesNew Technologies

• CompetitorsCompetitors

• What are the (What are the (futurefuture) standards?) standards?

• ContentsContents

• Digital Vs printedDigital Vs printed

• Information overflowInformation overflow

• Lifelong educationLifelong education

3.1 The causes of changes3.1 The causes of changes

• Development of I.T.Development of I.T.– Network, telecommunications, digitalization, Network, telecommunications, digitalization,

storage format, access model, …storage format, access model, …

• EconomyEconomy– Online, e-commerce, smart card , …Online, e-commerce, smart card , …

• Learning environmentLearning environment– Life-long learningLife-long learning

• Mode of communicationMode of communication– Email, ICQEmail, ICQ

3.2 New technologies3.2 New technologies

• Changing … so fastChanging … so fast

• Acronyms Acronyms – Help: Help: http://www.webopedia.comhttp://www.webopedia.com

• Who knows what the future would be?Who knows what the future would be?– Reluctant to changeReluctant to change

• Don’t be afraid to dig inDon’t be afraid to dig in– See : See : Editor’s notes, Computers in Libraries, vol.22, no.8, Editor’s notes, Computers in Libraries, vol.22, no.8,

p.6p.6

3.3 Competitors3.3 Competitors

• Who?Who?

– See: See: OCLC White paper on the Information OCLC White paper on the Information Habits of College StudentsHabits of College Students ( (http://www2.oclc.org/oclc/pdf/printondehttp://www2.oclc.org/oclc/pdf/printondemand/informationhabits.pdfmand/informationhabits.pdf))

•79% use a search engine for every or most 79% use a search engine for every or most searches!!searches!!

Technology Adoption Life Cycle

Early Majority

Innovators

Late Majority

LaggardsEarly Adopters

Source: Crossing the Chasm, Geoffrey Moore

Crystal Ball??Crystal Ball?? Number of visits Number of visits Usage of physical materials Usage of physical materials Training to users & real-time support Training to users & real-time support Demand for subject knowledge Demand for subject knowledge Competitors Competitors Fast services & high productivityFast services & high productivity Information provider and producerInformation provider and producer Cost-effectivenessCost-effectiveness Library workflow goes to e-business modelLibrary workflow goes to e-business model PartnershipPartnership Provide services that lead to incomeProvide services that lead to income

4. Overcome the challenges4. Overcome the challenges

• What business are we in? What business are we in?

• What are our major strengths & What are our major strengths & weakness?weakness?

• Who are our competitors? Who are our competitors?

• Who are our customers? their needs?Who are our customers? their needs?

• What factors are affecting Library?What factors are affecting Library?

• Do we have the skills?Do we have the skills?

4. Overcome the challenges – 4. Overcome the challenges – how?how?• Training - to keep abreast with new technologies Training - to keep abreast with new technologies • Human resources - partnersHuman resources - partners• Value-added servicesValue-added services• User-oriented mindsetUser-oriented mindset• AutomationAutomation• Improve the social image of librariansImprove the social image of librarians• Co-operation Co-operation • Talk with other people in order to understand the Talk with other people in order to understand the

technology different areas technology different areas • ResearchResearch

4. Overcome the challenges 4. Overcome the challenges (cont.)(cont.)

• Skills?Skills?– Librarianship & IT knowledge Librarianship & IT knowledge – Teamwork, CommitmentTeamwork, Commitment– Thinking methodology – creativity, use Thinking methodology – creativity, use

of knowledgeof knowledge– Outlook of the worldOutlook of the world– Interpersonal skillsInterpersonal skills– Health!!Health!!

Principles for building DLPrinciples for building DL Expect changeExpect change Know your contentKnow your content Involve the right peopleInvolve the right people Design usable systemDesign usable system Ensure open accessEnsure open access Beware of data rightsBeware of data rights Automate whenever possibleAutomate whenever possible Adopt and adhere to standardsAdopt and adhere to standards Ensure qualityEnsure quality Be concerned about persistenceBe concerned about persistence

McCray, A. & Gallagher, M. (2001). Principles for Digital Library Development, McCray, A. & Gallagher, M. (2001). Principles for Digital Library Development, Communications of the ACMCommunications of the ACM, 44(5), pp.49-54., 44(5), pp.49-54.

THE ENDTHE END

THANK YOU!THANK YOU!