37
CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

Embed Size (px)

Citation preview

Page 1: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB: Computational Linguistics for

Metadata Building

Center for Research on Information Access

Columbia University Libraries

Page 2: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University

CLiMB: Interdisciplinary Research Project at Columbia

UniversityFunded by Mellon Foundation 2002-2004

• Center for Research on Information Access (CRIA)

• Libraries• Computer Science Department

Page 3: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University

Problems in Image Access

Cataloging digital images Traditional approach:

manual expertise labor intensive expensive

Can automated techniques help?

Page 4: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University

Can we harvest image descriptors?

Page 5: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

04/19/23 5CLiMB - Columbia University

CLiMB Technical ContributionCLiMB will identify and extract

•proper nouns•terms and phrases

from text related to an image:

September 14, 1908, the basis of the Greenes' final design had been worked out. It featured a radically informal, V-shaped plan (that maintained the original angled porch) and interior volumes of various heights, all under a constantly changing roofline that echoed the rise and fall of the mountains behind it. The chimneys and foundation would be constructed of the sandstone boulders that comprised the local geology, and the exterior of the house would be sheathed in stained split-redwood shakes. —Edward R. Bosley. Greene & Greene. London : Phaidon, 2000. p. 127

Page 6: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University

Overall Goals

• Research: Development of richer retrieval through increased numbers of descriptors

• Practice: Development of suite of CLiMB tools• Resources: Vocabulary list which can be used by

other visual resource professionals

The essence of CLiMB: • Use scholars themselves as “catalogers” by utilizing

scholarly publications• Enhance existing descriptive metadata

Page 7: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University

CLiMB Project Teams

Coordinating

Collections(Curatorial)

Technical

ExternalAdvisory

Page 8: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University

Coordinating

Judith Klavans

Stephen Davis

Angela Giral

Patricia Renfro

Bob Wolven

Curatorial

CLiMB Committees

Judith Klavans

Stephen Davis

Angela Giral

Amy Heinrich

David Magier

Bob Scott

Bob Wolven

Roberta Blitz

Technical

Stephen Davis

Judith Klavans

Vera Horvath

David Elson

Roberta Blitz

Page 9: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University

Squeezing Metadata out of Scholarly Texts

• Image collection

• Associated text

• Target object identification (TOI)

• CLiMB suite of tools

• Evaluation

Page 10: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University

Phase

Inputs CLiMB Processes

I

II

III

User Evaluation

process texts

select metadata from texts

use CLiMB metadata in image

search platform

Art Librarians Subject Specialists

Catalogers

Search & Retrieval Experts

end-users

Image Search Platform with CLiMB Metadata

Image Search Platform

Source TEXT

TOIs

AAT / BBIs / etc.

Other Texts

Test Records

Core Descriptive Records

CLiMB Enriched Descriptive Records

Select words & phrases to include in Core Descriptive Records

Result: Enriched XML

Run CLiMB Suite of Tools

Generate TEI Markup

Imag

e C

olle

ctio

ns

Page 11: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University

Squeezing Metadata out of Scholarly Texts

• Image collection

• Associated text

• Target object identification (TOI)

• CLiMB suite of tools

• Evaluation

Page 12: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University

CLiMB Collections

• Greene & Greene Architectural Drawings, Avery Architectural and Fine Arts Library

• Chinese Paper Gods, C.V. Starr East Asian Library

• Photographs from the Archives, American Institute of Indian Studies

Page 13: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University

Greene & Greene Architectural Records and

Papers Collection Drawings and ArchivesAvery Architectural and Fine Arts Library

Columbia University Libraries

Page 14: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University

Charles Sumner Greene

(1868-1957)

Henry MatherGreene

(1870-1954)

Page 15: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University

NYDA.1960.001.00023

All Saints Episcopal Church (Pasadena, Calif.). Alterations1902-1903

Page 16: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University

Greene & Greene Catalog Record

Author: Greene & Greene.Title: [Mrs. Dudley P. Allen house, 1188 Hillcrest Avenue (Pasadena, Calif.).

Alterations.]Residence of Mrs. Dudley P. Allen, 1188 Hillcrest Ave., Pasadena, Cal.

[graphic] : Alteration / Greene & Greene, Architects. Published: [1917]

Physical Details: 4 sheets : various media ; 87.8 x 57.3 cm. (34 5/8 x 22 5/8 in.)Location: Columbia University, Avery Architectural Drawings

Other Authors: Greene, Charles Sumner, 1868-1957. Greene, Henry Mather, 1870-1954.

Subjects: HousesAlterationsArchitecture--Designs and plans--United States.Mrs. Dudley P. Allen house, 1188 Hillcrest Avenue (Pasadena, Calif.)

Component Item: [1] Item no. NYDA.1960.001.03224. [AVERYimage]. Electric lighting -- floor plan, part plan of basement : Sheet no.

Component Item: [2] Item no. NYDA.1960.001.00073. [AVERYimage]. [Electric lighting] -- floor plan, part plan of basement.

Page 17: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University

Greene & Greene Bibliography

• Bosley, Edward R. Greene & Greene. London : Phaidon, 2000.

• Current, William R. Greene & Greene: architects in the residential style. Fort Worth [Tex.] : Amon Carter Museum of Western Art, [1974]

• Makinson, Randell L. Greene & Greene: architecture as fine art. Salt Lake City : Peregrine Smith, c1977.

• Makinson, Randell L. Greene & Greene: the passion and the legacy. Salt Lake City : Gibbs and Smith, c1998.

• Smith, Bruce. Greene & Greene masterworks. San Francisco : Chronicle Books, c1998.

• Strand, Janann. A Greene & Greene guide [Pasadena, Calif. : G. Dahlstrom, 1974]

Page 18: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University

Page 19: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University

  

C.V. Starr East Asian Library, Columbia University

Chinese Paper GodsAnne S. Goodrich Collection

Page 20: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University

Pan-hu chih-shenGod of tigers

Page 21: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University

Title: Chuang gong chuang mu [graphic].

Published: [193-]

Physical Details: 1 print : wood-engraving, color ; 34 x 30 cm.

In: Anne S. Goodrich Collection.

Location: Columbia University, C.V. Starr East Asian Library (CJK)

EAX GAC 1 no. 16

Subjects: Gods, Chinese, in art.

Folk art--China.

Genre Or Form: Woodcuts--Chinese.

Notes: Date according to time period Anne S. Goodrich collected prints in Beijing.

Record ID: NYCP02-F20

Chinese Paper Gods Catalog Record

Page 22: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University

Chinese Paper Gods Bibliography

• Day, Clarence Burton. Chinese peasant cults : being a study of Chinese paper gods. Taipei : Ch'eng Wen Pub. Co., 1974.

• Goodrich, Anne Swann. Peking paper gods : a look at home worship. Nettetal : Steyler Verlag, 1991.

• Laing, Ellen Johnston. Art and aesthetics in Chinese popular prints: selections from the Muban Foundation collection. Ann Arbor, MI : Center for Chinese Studies, University of Michigan, c2002

Page 23: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University

HEADING: Nezha (Chinese deity)Used For/See From: Daluoxian (Chinese deity)

Jinhuan Yuanshuai (Chinese deity)Jinkang Yuanshuai (Chinese deity)Li Nezha (Chinese deity)Luoche Taizi (Chinese deity)Ne Zha (Chinese deity)Nezhataizi (Chinese deity)No-cha (Chinese deity)Nuozha (Chinese deity)Tailuoxian (Chinese deity)Taizi Yuanshuai (Chinese deity)Taiziyeh (Chinese deity)Yühuang Taizi (Chinese deity)Zhongtan Yuanshuai (Chinese deity)

Search Also Under:  Gods, Chinese

Chinese gods: selection from LC Authority File

Page 24: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University

Page 25: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University

Page 26: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University

Three Testbed Collections

• Greene & Greene

• detailed records

• more difficult to associate text with image

• Chinese Paper Gods

• strong associations

• problems with transliteration and variants

• South Asian Temples

• large set of digital images

• diacritics and variants

Page 27: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University

CLiMB Collections: Future

• Additional collection of digital images• Close association between image and text• Regularized metadata

Suggestions:• Catalogue raisonné• Museum collection catalog • Exhibition catalog

Page 28: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University

Squeezing Metadata out of Scholarly Texts

• Image collection

• Associated text

• Target object identification (TOI)

• CLiMB suite of tools

• Evaluation

Page 29: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University

Target Object Identification (TOI)

• Define based on institutional needs

• Varies from collection to collection– Greene & Greene – Project – Chinese Paper Gods – Deity– South Asian Temples –Location & Temple

• Compile authority list

Page 30: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University

Page 31: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University

Project Name Matching

• Locate project names in Greene & Greene• Challenge: finding variant name forms

– Robert R. Blacker house (TOI)– Blacker estate– The house

• Possible techniques to improve matching– Developing a semi-automatic technique– Use existing information to label text– An iterative platform for manual intervention

Page 32: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University

Squeezing Metadata out of Scholarly Texts

• Image collection

• Associated text

• Target object identification (TOI)

• CLiMB suite of tools

• Evaluation

Page 34: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University

Squeezing Metadata out of Scholarly Texts

• Image collection

• Associated text

• Target object identification (TOI)

• CLiMB suite of tools

• Evaluation

Page 35: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University

Next Steps – CLiMB Evaluation

Current Developments• Meeting with experts – October 17th• Survey with experienced image searchers

Long Term Goal• Test CLiMB tools and data in an image

search platform

Page 36: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University

CLiMB: Computational Linguistics for Metadata Building

• Image collection

• Associated text

• Target object identification (TOI)

• CLiMB suite of tools

• Evaluation

Page 37: CLiMB - Columbia University CLiMB: Computational Linguistics for Metadata Building Center for Research on Information Access Columbia University Libraries

CLiMB - Columbia University

Thank you!

Any questions?

www.columbia.edu/cu/cria/climb