72
IS 257 - Fall 2002 2002.11.07- SLIDE 1 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School of Information Management and Systems SIMS 257: Database Management

2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 1

Database Applications -- The UC Berkeley Environmental Digital

LibraryUniversity of California, Berkeley

School of Information Management and Systems

SIMS 257: Database Management

Page 2: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 2

Lecture Outline

• Review– Database Administration

• Database Applications – Berkeley’s Environmental Digital Library

Page 3: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 3

Final Project Requirements

• See WWW site:– http://sims.berkeley.edu/courses/is257/f02/index.html

• Report on personal/group database including:– Database description and purpose– Data Dictionary– Relationships Diagram– Sample queries and results (Web or Access tools)– Sample forms (Web or Access tools)– Sample reports (Web or Access tools)– Application Screens (Web or Access tools)

Page 4: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 4

Terms and Concepts (trad)

• Data Administration– Responsibility for the overall management

of data resources within an organization

• Database Administration– Responsibility for physical database design

and technical issues in database management

• These roles are often combined or overlapping in some organizations

Page 5: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 5

Database System Life Cycle

Operation &Maintenance

DatabaseImplementation

DatabaseDesign

Growth &Change

DatabaseAnalysis

DatabasePlanning

Note: this is a different version of thislife cycle than discussed previously

Page 6: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 6

Database Planning: DA & DBA functions

• Develop corporate database strategy (DA)

• Develop enterprise model (DA)

• Develop cost/benefit models (DA)

• Design database environment (DA)

• Develop data administration plan (DA)

Page 7: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 7

Database Analysis: DA & DBA functions

• Define and model data requirements (DA)

• Define and model business rules (DA)

• Define operational requirements (DA)

• Maintain corporate Data Dictionary (DA)

Page 8: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 8

Database Design: DA &DBA functions

• Perform logical database design (DA)

• Design external models (subschemas) (DBA)

• Design internal model (Physical design) (DBA)

• Design integrity controls (DBA)

Page 9: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 9

Database Implementation DA & DBA functions

• Specify database access policies (DA & DBA)• Establish Security controls (DBA)• Supervise Database loading (DBA)• Specify test procedures (DBA)• Develop application programming standards

(DBA)• Establish procedures for backup and recovery

(DBA)• Conduct User training (DA & DBA)

Page 10: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 10

Operation and Maintenance: DA & DBA functions

• Monitor database performance (DBA)

• Tune and reorganize databases (DBA)

• Enforce standards and procedures (DBA)

• Support users (DA & DBA)

Page 11: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 11

Growth & Change: DA & DBA functions

• Implement change control procedures (DA & DBA)

• Plan for growth and change (DA & DBA)

• Evaluate new technology (DA & DBA)

Page 12: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 12

Functions in Database Administration

• Planning and Design (we have already looked at theses processes in detail)

• Data Integrity

• Backup and Recovery

• Security Management

Page 13: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 13

Data Integrity

• Intrarecord integrity (enforcing constraints on contents of fields, etc.)

• Referential Integrity (enforcing the validity of references between records in the database)

• Concurrency control (ensuring the validity of database updates in a shared multiuser environment)

Page 14: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 14

Database Security

• Views or restricted subschemas• Authorization rules to identify users and

the actions they can perform• User-defined procedures (and rule

systems) to define additional constraints or limitations in using the database

• Encryption to encode sensitive data• Authentication schemes to positively

identify a person attempting to gain access to the database

Page 15: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 15

Database Backup and Recovery

• Backup

• Journaling (audit trail)

• Checkpoint facility

• Recovery manager

Page 16: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 16

Disaster Recovery Planning

Testing andTraining

ProceduresDevelopment

Budget &Implement

PlanMaintenance

RecoveryStrategies

RiskAnalysis

From Toigo “Disaster Recovery Planning”

Page 17: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 17

Threats to Assets and Functions

• Water

• Fire

• Power Failure

• Mechanical breakdown or software failure

• Accidental or deliberate destruction of hardware or software– By hackers, disgruntled employees, industrial

saboteurs, terrorists, or others

Page 18: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 18

Threats

• Between 1967 and 1978 fire and water damage accounted for 62% of all data processing disasters in the U.S.

• The water damage was sometimes caused by fighting fires

• More recently improvements in fire suppression (e.g., Halon) for DP centers has meant that water is the primary danger to DP centers

Page 19: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 19

Kinds of Records

• Class I: VITAL – Essential, irreplaceable or necessary to recovery

• Class II: IMPORTANT– Essential or important, but reproducible with difficulty

or at extra expense

• Class III: USEFUL– Records whose loss would be inconvenient, but which

are replaceable

• Class IV: NONESSENTIAL– Records which upon examination are found to be no

longer necessary

Page 20: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 20

Offsite Storage of Data

• Early offsite storage facilities were often intended to survive atomic explosions

• PRISM International directory

• Mirror sites (Hot sites)– E.g. Cantor-Fitzgerald

Page 21: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 21

Today

• Object Relational Database Applications– The Berkeley Digital Library Project

• Slides from RRL and Robert Wilensky, EECS

– Use of DBMS in DL project

Page 22: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 22

Final Presentations and Reports• Specifications for final report are on the

Web Site under assignments

• Presentations (1 on Nov. 28, Others on Nov 30, Dec 5th and 7th (Full))

Page 23: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 23

Today

• Object Relational Applications

• The UCB Digital Library

Page 24: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 24

Overview

• What is an Digital Library?

• Overview of Ongoing Research on Information Access in Digital Libraries

Page 25: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 25

Digital Libraries Are Like Traditional Libraries...

• Involve large repositories of information (storage, preservation, and access)

• Provide information organization and retrieval facilities (categorization, indexing)

• Provide access for communities of users (communities may be as large as the general public or small as the employees of a particular organization)

Page 26: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 26

Originators

Libraries

Users

Traditional Library System

Page 27: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 27

But Digital Libraries Are Different From Libraries...

• Not a physical location with local copies; objects held closer to originators

• Decoupling of storage, organization, access

• Enhanced Authoring (origination, annotation, support for work groups)

• Subscription, pay-per-view supported in addition to “free” browsing.

• Integration into user tasks.

Page 28: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 28

Originators

Repositories

Users

Index Services

Network

A Digital Library Infrastructure Model

Page 29: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 29

UC Berkeley Digital Library Project

• Focus: Work-centered digital information services

• Testbed: Digital Library for the California Environment

• Research: Technical agenda supporting user-oriented access to large distributed collections of diverse data types.

• Part of the NSF/NASA/DARPA Digital Library Initiative (Phases 1 and 2)

Page 30: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 30

UCB Digital Library Project: Research Organizations

• UC Berkeley EECS, SIMS, CED, IS&T• UCOP/CDL• Xerox PARC’s Document Image Decoding group

and Work Practices group• Hewlett-Packard• NEC • SUN Microsystems• IBM Almaden• Microsoft• Ricoh California Research• Philips Research

Page 31: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 31

Testbed: An Environmental Digital Library

• Collection: Diverse material relevant to California’s key habitats.

• Users: A consortium of state agencies, development corporations, private corporations, regional government alliances, educational institutions, and libraries.

• Potential: Impact on state-wide environmental system (CERES )

Page 32: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 32

The Environmental Library -Users/Contributors

• California Resources Agency, California Environment Resources Evaluation System (CERES)

• California Department of Water Resources

• The California Department of Fish & Game

• SANDAG

• UC Water Resources Center Archives

• New Partners: CDL and SDSC

Page 33: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 33

The Environmental Library - Contents

• Environmental technical reports, bulletins, etc.• County general plans• Aerial and ground photography• USGS topographic maps• Land use and other special purpose maps• Sensor data• “Derived” information• Collection data bases for the classification and

distribution of the California biota (e.g., SMASCH)• Supporting 3-D, economic, traffic, etc. models• Videos collected by the California Resources

Agency

Page 34: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 34

The Environmental Library - Contents

• As of late 2002, the collection represents over one terabyte of data, including over 183,000 digital images, about 300,000 pages of environmental documents, and over 2 million records in geographical and botanical databases.

Page 35: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 35

Botanical Data:

• The CalFlora Database contains taxonomical and distribution information for more than 8000 native California plants. The Occurrence Database includes over 600,000 records of California plant sightings from many federal, state, and private sources. The botanical databases are linked to the CalPhotos collection of California plants, and are also linked to external collections of data, maps, and photos.

Page 36: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 36

Geographical Data:

• Much of the geographical data in the collection has been used to develop our web-based GIS Viewer. The Street Finder uses 500,000 Tiger records of S.F. Bay Area streets along with the 70,000-records from the USGS GNIS database. California Dams is a database of information about the 1395 dams under state jurisdiction. An additional 11 GB of geographical data represents maps and imagery that have been processed for inclusion as layers in our GIS Viewer. This includes Digital Ortho Quads and DRG maps for the S.F. Bay Area.

Page 37: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 37

Documents:

• Most of the 300,000 pages of digital documents are environmental reports and plans that were provided by California state agencies. This collection includes documents, maps, articles, and reports on the California environment including Environmental Impact Reports (EIRs), educational pamphlets, water usage bulletins, and county plans. Documents in this collection come from the California Department of Water Resources (DWR), California Department of Fish and Game (DFG), San Diego Association of Governments (SANDAG), and many other agencies. Among the most frequently accessed documents are County General Plans for every California county and a survey of 125 Sacramento Delta fish species.

Page 38: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 38

Documents - cont.

• The collection also includes about 20Mb of full-text (HTML) documents from the World Conservation Digital Library. In addition to providing online access to important environmental documents, the document collection is the testbed for our Multivalent Document research.

Page 39: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 39

Testbed Success Stories

• LUPIN: CERES’ Land Use Planning Information Network– California Country General Plans and other

environmental documents.– Enter at Resources Agency Server, documents stored

at and retrieved from UCB DLIB server.

• California flood relief efforts– High demand for some data sets only available on our

server (created by document recognition).

• CalFlora: Creation and interoperation of repositories pertaining to plant biology.

• Cloning of services at Cal State Library, FBI

Page 40: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 40

Research Highlights

• Documents– Multivalent Document prototype

• Page images, structured documents, GIS data, photographs

• Intelligent Access to Content– Document recognition – Vision-based Image Retrieval: stuff, thing,

scene retrieval– Natural Language Processing: categorizing

the web, Cheshire II, TileBar Interfaces

Page 41: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 41

Multivalent Documents

• MVD Model– radically distributed, open, extensible– “behaviors” and “layers”

• behaviors conform to a protocol suite • inter-operation via “IDEG”

• Applied to “enlivening legacy documents”– various nice behaviors, e.g., lenses

Page 42: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 42

Document Presentation

• Problem: Digital libraries must deliver digital documents -- but in what form?

• Different forms have advantages for particular purposes– Retrieval– Reuse– Content Analysis– Storage and archiving

• Combining forms (Multivalent documents)

Page 43: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 43

Spectrum of Digital Document Representations

Adapted from Fox, E.A., et al. “Users, User Interfaces and Objects: Evision, an Electronic Library”, JASIS 44(8), 1993

Page 44: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 44

Document Representation: Multivalent Documents

• Primary user interface/document model for UCB Digital Library (Wilensky & Phelps)

• Goal: An approach to new document representations and their authoring.

• Supports active, distributed, composable transformations of multimedia documents.

• Enables sophisticated annotations, intelligent result handling, user-modifiable interface, composite documents.

Page 45: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 45

Multivalent Documents

Cheshire LayerCheshire Layer

OCR LayerOCR Mapping LayerHistory of The Classical World

The jsfj sjjhfjs jsjjjsjhfsjf sjhfjksh sshfjsfksfjk sjs jsjfs kjsjfkjsfhskjf sjfhjkshskjfhkjshfjkshjsfhkjshfjkskjfhsfhskjfksjflksjflksjflksfsjfksjfkjskfjskfjklsslkslfjlskfjklsfklkkkdsjksfksjfkskflk sjfjksfkjsfkjsfkjshf sjfsjfjksksfjksfjksjfkthsjir\\ksksfjksjfkksjkls’ksklsjfkskfksjjjhsjhuusfsjfkjs

Modernjsfj sjjhfjs jsjjjsjhfsjf sslfjksh sshfjsfksfjk sjs jsjfs kjsjfkjsfhskjf sjfhjkshskjfhkjshfjkshjsfhkjshfjkskjfhsfhskjfksjflksjflksjflksfsjfksjfkjskfjskfjklsslkslfjlskfjklsfklkkkdsj

GIS Layer

taksksh kdjjdkd kdjkdjkd kjsksksk kdkdk kdkd dkkskksksk jdjjdj clclc ldldl

taksksh kdjjdkd kdjkdjkd kjsksksk kdkdk kdkd dkkskksksk jdjjdj clclc ldldl

Table 1.

Table Layer

kdkdkdkdk Scanned

PageImage

Valence:2: The relativecapacity to unite,react, or interact(as with antigensor a biologicalsubstrate).

Webster’s 7th CollegiateDictionary

Network Protocols &Resources

Page 46: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 46

Page 47: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 47

Page 48: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 48

MVD Third Party Work

• Japanese support by NEC; application to office document management

• Printing, support for other OCR formats, by HP

• Chinese character and multilingual lens by UCB Instructional Support staff (Owen McGrath)

• Automatic enlivening of documents via Transcend proxy.

Page 49: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 49

MVD Forthcoming

• Support for XML + style sheets• More robust parsing• Saving where you want• Media adaptors for

– Continuous media– Near image formats, word proc. formats

• Improve authoring tools• Interoperation with paper• Application versus applet?• Release to community, get feedback, iterate.

Page 50: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 50

GIS in the MVD Framework

• Layers are georeferenced data sets.• Behaviors are

– display semi-transparently– pan– zoom– issue query– display context– “spatial hyperlinks”– annotations

• Written in Java (to be merged with MVD-1 code line?)

Page 51: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 51

GIS Viewer: Recent Developments

• Annotation and saving– points, rectangles (w. labels and links),

vectors – saving of annotations as separate layer

• Integration with address, street finding, gazetteer services

• Application to image viewing: tilePix

• Castanet client

Page 52: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 52

Page 53: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 53

Page 54: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 54

Page 55: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 55

GIS Viewer Examplehttp://elib.cs.berkeley.edu/annotations/gis/buildings.html

Page 56: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 56

Geographic Information: Plans and Ideas

• More annotations, flexible saving

• Support for large vector data sets

• Interoperability– On-the-fly

• conversion of formats• generation of “catalogs”

– Via OGDI/GLTP– Experimenting with various CERES servers

Page 57: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 57

Documents: Information from scanned documents

• Built document recognizers for some important documents, e.g. “Bulletin 17”. “TR-9”.

• Recognized document structure, with order magnitude better OCR.

• Automatically generated 1395 item dam relational data base.

• Enabled access via forms, map interfaces.

• Enable interoperation with image DB.

Page 58: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School
Page 59: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School
Page 60: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School
Page 61: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 61

Document Recognition: Ongoing Work

• Document recognizers: for ~ dozen document types

• Development and integration of mathematical OCR and recognition.

• Eventually produce document recognizer generator, i.e., make it easier to write recognizers.

Page 62: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 62

Vision-Based Image Retrieval

• Stuff-based queries: “blobs”– Basic blobs: colors, sizes, variable number

• demonstrated utility for interesting queries

– “Blob world”: Above plus texture, applied to• retrieving similar images• successful learning scene classifier

• Thing-finding: Successfully deployed detectors adding body plans (adding shape, geometry and kinematic constraints)

Page 63: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 63

Image Retrieval Research

• Finding “Stuff” vs “Things”

• BlobWorld

• Other Vision Research

Page 64: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 64

(Old “stuff”-based image retrieval: Query)

Page 65: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 65

(Old “stuff”-based image retrieval: Result)

Page 66: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 66

Blobworld: use regions for retrieval

• We want to find general objects Represent images based on coherent regions

Page 67: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School
Page 68: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School
Page 69: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 69

(“Thing”-based image retrieval using “body plans”: Result)

Page 70: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 70

Natural Language Processing

• Developed automatic categorization/disambiguation method to point where topic assignment (but not disambiguation) appears feasible.

• Ran controlled experiment:– Took Yahoo as ground truth.– Chose 9 overlapping categories; took 1000

web pages from Yahoo as input.– Result: 84% precision; 48% recall (using top

5 of 1073 categories)

Automatic Topic Assignment

Page 71: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 71

(Isaac’s Automatically Generated Ontology)IAGO (0.1)! = Yahoo - labor + NLP

• We categorized (part of) the Web:– 1073 categories; 8000 web pages– ~80% precision for good categories

• E.g., “motion pictures”, “the environment”, “music”• IAGO 1.0 in the works:

– Eliminate pages with little text.– Eliminate proper nouns.– Retrained with MS Encarta - Improved performance

dramatically (perhaps enough to disambiguate the web)!

– Need to compute word sense priors using the web.– [Recode implementation to keep up with web crawler.]

Page 72: 2002.11.07- SLIDE 1IS 257 - Fall 2002 Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School

IS 257 - Fall 2002 2002.11.07- SLIDE 72

Further Information

• Berkeley DL web sitehttp://elib.cs.berkeley.edu