23
Digital Libraries Nick Narcise April 4 th 2006

Digital Libraries Nick Narcise April 4 th 2006. What is a Digital Library?

Embed Size (px)

Citation preview

Digital Libraries

Nick Narcise

April 4th 2006

What is a Digital Library?

What is a Digital Library?

Definition from Wikipedia

A digital library is a library in which a significant proportion of the resources are available in machine-readable format (as opposed to print or microform), accessible by means of computers.

The digital content may be locally held or accessed remotely via computer networks.

D-Lib Magazine

What Do You Do with a Million Books?

Gregory CraneTufts UniversityD-Lib Magazine

March 2006

Volume 12 Number 3

ISSN 1082-9873

http://www.dlib.org/dlib/march06/crane/03crane.html

Main Focus

The ability to extract from the stored record of humanity useful information in an actionable format for any given human being of any culture at any time and in any place

Reduce the tangle of text mining, analysis, and searching technologies

converting analog source to text

translating one language to another

Transform raw text into data

How is a Library digitized?

The process of digitizing a library began with the catalog, moved to periodicalindexes and abstracting services, next to

periodicals and large reference works and

finally book publishing.

Some of the largest and most successful digital

libraries are Project Gutenberg, ibiblio and the

Internet Archive.

Optical Character Recognition

From Wikipedia, the free encyclopedia

Optical character recognition, usually abbreviated to OCR, involves computer software designed to translate images of typewritten text (usually captured by a scanner) into machine-editable text, or to translate pictures of characters into a standard encoding scheme representing them in (ASCII or Unicode).

Problems with OCR

May have errors Useless as a knowledge base Human beings are still much better at

reading and interpreting the contents of page images than machines.

Text, Information, Knowledge and the Evolving Record of Humanity

Gregory Crane and Alison JonesTufts UniversityD-Lib MagazineMarch 2006Volume 12 Number 3

ISSN 1082-9873

http://www.dlib.org/dlib/march06/jones/03jones.html

C. Montgomery Burns: "I'd like to send this letter to the Prussian consulate in Siam by aeromail. Am I too late for the 4:30 autogyro?"

Clerk: "Uhhh, I better look in the manual ..."

Burns: "The ignorance! ..."

Clerk: "This book must be out of date – I don't see 'Prussia,' 'Siam' or 'autogyro.'"

From "Mother Simpson," The Simpsons Television Show, Episode 3F06

Digital Reference Materials

Thesaurus of Geographic Names (TGN) Includes names and other information about places such as cities, counties, nations and

their associated physical features like mountains, coasts and rivers. Other information related to history, population, culture, art and architecture is included.

TGN can associate the obsolete name Siam with the nation of Thailand (tgn,1000142) – but also with towns named Siam in Iowa (tgn,2035651), Tennessee (tgn,2101519), and Ohio (tgn,2662003). Prussia appears but as a general region (tgn,7016786), with no indication when or if it was a sovereign nation.

Alexandria Digital Library (ADL) represents a sophisticated framework with which to create such resources: places can

be associated with temporal information about their foundation (e.g., Washington, DC, founded on 16 July 1790),

Consider the sentence

“The current price of tea in China is 35

cents per pound."

The idea is that a digital library could

plot the prices of various commodities in different markets over time,

plot the various lifetimes of individuals, or extract and classify many events would be very useful

Digital Reference Materials

Carefully transcribed primary sources<l n="22">Forte fuit iuxta tumulus, quo cornea summo</l>

Gazetteers and semi-structured text sources<div 2 type=entry><head>AARONSBURG</head><p>P v., Hains t., Centre co., Pa. It is at the eastern extremity of Penn's valley, near Penn's creek, 32 m. Bellefonte, 89 N.W. Harrisburg. 181 W. It contains a lutheran church, two stores, and 450 inhab

Citation-based authority lists<div1 type="entry" id="abdera"><head>Abdera</head><div2 type="subentry" id="abdera-1"><head>Abdera, city of Thrace</head><div3 type="index"><list type="index"><item><bibl n="Paus. 6.5.4">Paus. 6.5.4</bibl>, <bibl n="Paus. 6.14.12">Paus. 6.14.12</bibl></item><item>a town of Thrace on the Nestus: <bibl n="Hdt. 1.168">Hdt. 1.168</bibl>, <bibl n="Hdt. 6.46">Hdt. 6.46</bibl>, <bibl n="Hdt. 7.109">Hdt. 7.109</bibl>, <bibl n="Hdt. 7.120">Hdt. 7.120</bibl>, <bibl n="Hdt. 7.126">Hdt. 7.126</bibl></item><item>founded at grave of Abderus: <bibl n="Apollod. 2.5.7">Apollod. 2.5.7</bibl></item><item>Xerxes' first halt in his flight: <bibl n="Hdt. 8.120">Hdt. 8.120</bibl></item></list></div3></div2></div1>

Digital Reference Materials

Machine readable dictionaries <entryFree id="n3709" key="a)krwth/rion" type="main"><orth

extent="full" lang="greek">a)krwth/rion</orth>, <genlang="greek">to/</gen>, (<etym lang="greek">a)/kros</etym>)<sense id="n3709.0" n="A" level="1"><tr>topmost</tr> or <tr>prominent part</tr>, <foreign lang="greek">a). tou= ou)/reos</foreign> mountain <tr>peak</tr>, <bibl n="Perseus:abo:tlg,0016,001:7:217"><author>Hdt.</author><biblScope>7.217</biblScope></bibl>

General Encyclopedias

A Research Library Based on the Historical Collections of the Internet Archive

William Y. Arms, Selcuk Aya, Pavel DmitrievComputer Science Department, Cornell University

Blazej KotInformation Science, Cornell University

Ruth Mitchell, Lucia WalleCornell Theory Center, Cornell UniversityD-Lib Magazine

February 2006

Volume 12 Number 2

ISSN 1082-9873

http://www.dlib.org/dlib/february06/arms/02arms.html

Main Idea of Article

Academic researchers have to comb through collections of libraries, museums, and archives to analyze and synthesize the information buried within them.

A Web Library for Social Science Research

Idea is to replace much of the tedious manual effort with computer programs that act as their agents.

challenge was to organize the materials and provide powerful, intuitive tools that will make a huge collection of semi-structured data accessible to researchers, without demanding high levels of computing expertise.

Questions?

Thank You