51
Via FocusOn Search and CategoryMap An Integrated Approach for Discovery of University Resources and Library on the Web 2009 Serials Solutions Workshop March 11, 2009 Seattle, WA Updated March 10, 2011 By Amanda Xu St. John’s University Library Jamaica, New York 1

Subject Access Enhancement: FocusOn Search and CategoryMap: An Integrated Approach for Discovery of University Resources and Library on the Web

Embed Size (px)

DESCRIPTION

Subject access refers to find and locate the ‘Aboutness’ of a named entity (person, family, corporate body) or a concept, object, event, and place. Subject access enhancement refers to providing integrated subject access to structured, semi-structured, and unstructured data. This presentation compared known and unknow-term search in Google, library OPAC and Website, and university website; introduced various subject access enhancement techniques applied to a library OPAC that supports unknown-term search through examples; and pointed out challenges in providing an integrated subject access across all resources of an enterprise - university website, library Opac, library website, and other data service points. FocusOn Search and CategoryMap are considered as essential components to enhance subject access for such data. The presentation also suggested how the two new utilities be implemented as plug-in to existing cataloging environment, which allow catalogers to 1) configure web services capable to consume metadata other than MARC format, 2) create and maintain categories conforming to enterprise service bus at local library level, home-institution level, consortium level, bibliographic utility level, and other data service level.

Citation preview

Subject Access Enhancement Via FocusOn Search and CategoryMap

Subject Access Enhancement Via FocusOn Search and CategoryMapAn Integrated Approach for Discovery of University Resources and Library on the Web

2009 Serials Solutions Workshop

March 11, 2009

Seattle, WA

Updated March 10, 2011

By

Amanda Xu

St. Johns University Library

Jamaica, New York

1

Acknowledgement: Prof. Isael Moskowitz from NYU; Andrew Sankowsi, Cynthia Chambers and Theresa Maylone from St. Johns Univ. Libraries

1

Overview

Subject Access Enhancement Introduction FocusOnSearch & CategoryMap

Business Scenario for FocusOn Search and CategoryMap

System Front End for FocusOn Search and CategoryMap

System Backend for FocusOn Search and CategoryMap (Taxonomy Management Module)

FocusOn Search and CategoryMap in Distributed Network/Web (Logical Network Diagram)

DFD (Data Flow Diagram) Context Level for FocusOnSearch

ER Diagram Adapted from RDA (Resource Description and Access)

System Flow Chart for FocusOn Search and CategoryMap

System Flow Chart for the Data Movement of all Vocabularies

Suggestion for Future

References

2

Definition: Subject access refers to find and locate the Aboutness of a named entity (person, family, corporate body) or a concept, object, event, and place

To make this happen, at document processing side, we do subject analysis, and other processing as the followings:

Provide classification to the document;

Sometimes, provide categorization to the document;

Describe the Aboutness of the document, e.g. identify the named entity, concept, object, event, and place;

Tag the named entity, concept, object, event and place in a controlled manner (e.g. provide authority control for the named entity and subjects, including thesaurus such as AAT, LCSH, MESH);

Index them, hoping that the query that the searcher enters into the system matches with the term that we have in the index;

Usually store the tags/metadata in relational database systems, and the associated documents in flat file systems;

By the way, this is what people call Semi-structured data;

Unstructured data refers to documents that are in .doc files, .txt files, .xls files, .email, and telephone transcripts;

Structured data refers to data in relational database systems, object-oriented database systems, and other structured systems, etc.;

The world has 20% of the data in structured systems, and 80% of the data in unstructured and semi-structured systems.

Subject access enhancement refers to providing integrated subject access to structured, semi-structured, and unstructured data.

FocusOnSearch and CategoryMap are considered as essential components to enhance subject access for such data.

2

Subject Access Enhancement FocusOnSearch and CategoryMap (1 of 20)

DATA - Structured (20%), Semi Structured & Unstructured (80%)

IDC - Percentage Searches on Web Aboutness for a topic search (45%), and scientific and technical info search (35%)

Query limited to Boolean, Relevance ranking, Phrase, Link Analysis on Refined Indexes by Keywords, Media, and File Types on Web

Unknown Named Entities and Topical Search often Discovered by Accident on Web

Result List Rendered often Makes no Sense for Aboutness Search on Web, let alone supporting business intelligence

Cumbersome Info Sharing Processes for Enterprise Wide Information Discovery

3

Google Search example Result list does not differentiate books written by Henry George himself from books or topics about Henry George.

OPAC search comes handy as we markup both books written by the Henry George and books or topics about Henry George in the bib records.

3

Subject Access Enhancement FocusOnSearch and CategoryMap (2 of 20)

Google Query: Algebra Data Processing Periodical Computer Algebra - ACM

4

Google works for known item search, e.g. ACM Communications in Computer Algebra by title.

Google does not work for unknown item search, e.g. Algebra-Data Processing-Periodical. The above title can not be found in the first page.

4

Subject Access Enhancement FocusOnSearch and CategoryMap (3 of 20)

OPAC: Subject Keyword AND w/ Relevance Ranking SKEY(^*) in Simple Query Mode

5

OPAC supports both known and unknown item search;

Example of unknown item search in OPAC WebVoyage by subject keyword AND with relevance ranking in simple search mode.

5

Subject Access Enhancement FocusOnSearch and CategoryMap (4 of 20)

OPAC: Advanced Query Mode: Subject Keyword Boolean AND

6

Unknown item search in OPAC WebVoyage by subject Boolean keyword AND in advanced search mode.

6

Subject Access Enhancement FocusOnSearch and CategoryMap (5 of 20)

OPAC Rendering: Brief Display Record Display

7

OPAC rendering result for unknown item search by subject keyword in OPAC - WebVoyage

7

Subject Access Enhancement FocusOnSearch and CategoryMap (6 of 20)

OPAC: Subject Browse (SUBJ):

Algebra Data processing Periodicals

8

Unknown item search in OPAC WebVoyage by subject browse.

8

Subject Access Enhancement FocusOnSearch and CategoryMap (7 of 20)

LC Classification

QPAC: LC Classification QA 150-272 - Algebra

QA 155.7.E4 - Algebra Electronic Data Processing

9

Use LC classification - QA 150-272 to group items whose Aboutness is Algebra; and QA 155.7.E4 is Algebra Electronic Data Processing.

Unknown item search in OPAC WebVoyage by LC classification browse in a bib. This is still a challenge. Why?

9

Subject Access Enhancement FocusOnSearch and CategoryMap (8 of 20)

OPAC: Call No. Browse CALL

Browse: QA155.7 collocating print collections on the topic

10

Only print collection can be collocated using call no. browse.

E-J collections can not be browsed even though LC classification exists in the 050 field of a bib record. In Voyager, call number index & browse comes out of MARC holdings field 852$h, rather than the classification number in a bib.

Collocate print, electronic, and other types of collections under LC classification is still a challenge.

Serials Solutions supplies subject categories to full-text e-j A-Z list using LC classification. A work around can be made to Serials Solutions MARC title list using the same subject category scheme as the one used for full-text e-j A-Z list. The note fields in slide number 21 indicates the implications to ILS operations, and others.

10

Subject Access Enhancement FocusOnSearch and CategoryMap (9 of 20)

Full text E-J Portal on Library Web: Known Item Search by Title, ISSN only

11

Full-text e-j portal on the library Web supports only known-item search by title and ISSN.

11

Subject Access Enhancement FocusOnSearch and CategoryMap (10 of 20)

Full text E-J Portal on Library Web: Unknown Item Browse by Subject Mathematics: Algebra

12

Full-text e-j portal on the library Web supports unknown item browse by subject. Two titles have been highlighted: 1) ACM communications in computer algebra; and 2) Annals of combinatorics.

12

Subject Access Enhancement FocusOnSearch and CategoryMap (11 of 20)

Query Submitted to the Search Box on University Website Retried Info on People, Events, Curriculum, etc.: Algebra Electronic Data Processing, Combinatorics, Henry George, Wankel, Charles, etc.

13

The next few slides are examples of a page rendering from the University website in a single search box: Algebra Electronic Data Processing, Combinatorics, Henry George, Charles Wankel, etc.

A search of the term Algebra Electronic Data Processing retrieves schools and bulletin info on the university website.

13

Subject Access Enhancement FocusOnSearch and CategoryMap (12 of 20)

Query Submitted to the Search Box on University Website Retried Info on People, Events, Curriculum, etc.: Algebra Electronic Data Processing, Combinatorics, Henry George, Wankel, Charles, etc.

14

A search of the term Combinatorics retrieves related academic events, faculty info, and school bulletins in 2004.

Page rendering at the University Website should we group the result list as bread crumbs/or folder structure for named entity, concept, object, event, place, timeline, etc.?

What about lifecycle maintenance of the web page content?

14

Subject Access Enhancement FocusOnSearch and CategoryMap (13 of 20)

Query Submitted to the Search Box on University Website Retried Info on People, Events, Curriculum, etc.: Algebra Electronic Data Processing, Combinatorics, Henry George, Wankel, Charles, etc.

15

A search of the term Henry George retrieves related academic events and library resource info page on the university Website.

Group the Aboutness of Henry George into a sense-making page regardless the location of the page?

15

Subject Access Enhancement FocusOnSearch and CategoryMap (14 of 20)

16

Query Submitted to the Search Box on University Website Retried Info on People, Events, Curriculum, etc.: Algebra Electronic Data Processing, Combinatorics, Henry George, Wankel, Charles, etc.

Identify works BY or ABOUT Charles Wankle on the University Website by linking between University Website and OCLC WorldCat Identities Services for Named Entities Resolution?

More examples of Charles Wankel page lookup from OCLC WorldCat Identities Services in the next few slides.

16

Query Submitted to the Search Box on University Website Retried Info on People, Events, Curriculum, etc.: Algebra Electronic Data Processing, Combinatorics, Henry George, Wankel, Charles, etc. (15 of 20)

17

Linking between University Website and OCLC WorldCat Identities Services for Named Entities Resolution

Screen 1 of 4: Overview, and Work Activity Period on OCLC WorldCat Identities

More Examples on Result List It supports horizontal and vertical scans, and featured recognition about the named entity Wankel, Charles on OCLC WorldCat Identities

Link between University Website and OCLC WorldCat Identities Services for Named Entities Resolution?

17

Query Submitted to the Search Box on University Website Retried Info on People, Events, Curriculum, etc.: Algebra Electronic Data Processing, Combinatorics, Henry George, Wankel, Charles, etc . (16 of 20)

18

Linking between University Website and OCLC WorldCat Identities Services for Named Entities Resolution

Screen 2 of 4: Works Created by Charles Wankle on OCLC WorldCat Identities

Link between University Website and OCLC WorldCat Identities Services for Named Entities Resolution?

18

Query Submitted to the Search Box on University Website Retried Info on People, Events, Curriculum, etc.: Algebra Electronic Data Processing, Combinatorics, Henry George, Wankel, Charles, etc . (17 of 20)

19

Linking between University Website and OCLC WorldCat Identities Services for Named Entities Resolution

Screen 3 of 4: Audience Level and Works Related to Dr. Charles Wankle on OCLC WorldCat Identities

Link between University Website and OCLC WorldCat Identities Services for Named Entities Resolution?

19

Query Submitted to the Search Box on University Website Retried Info on People, Events, Curriculum, etc.: Algebra Electronic Data Processing, Combinatorics, Henry George, Wankel, Charles, etc . (18 of 20)

20

Linking between University Website and OCLC WorldCat Identities Services for Named Entities Resolution

Screen 4 of 4: Concept Terms Related to Dr. Charles Wankle

Link between University Website and OCLC WorldCat Identities Services for Named Entities Resolution?

20

21

Subject categorization using LC Classification:

Serials Solutions offer subject browse for full-text e-j A-Z title list categorized using LC classification;

Should the same subject categorization be applied to MARC title list?

If so, library operations within ACQ, CIRC, CAT, Reserve, OPAC, Collection Development modules, etc. can all be categorized, tracked, and reported under LC classification scheme for subject categorization, in addition to integrated discovery of university resources and library on the Web;

If so, creating library guides at title-level on a particular topic can also be achieved via WebVoyage with a single hot-link;

If so, conformed dimension for enterprise bus architecture can finally be obtainable via LC classification scheme.

Implications to Voyager are the followings if the page on Dr. Charles Wankel were categorized using LC classification scheme:

ACQ Fund code, e.g. 271-7652-302 Queens Books-Management;

Cataloging Client Use MARC 698 field in the Bib for category terms obtained from Serials Solutions full-text e-j list by subject. Item Statistical Category Code 901 and 902 for management in Tobins College of Business, and item 900 for management;

OPAC browse and search MARC 698 in bib and statistical category 900 in item, e.g. management is the conformed dimension in EBA (Enterprise Bus Architecture);

CIRC Patron Group Type matches to Item 901 and 902;Collection Management Module How many Faculty at Tobin College are interested in Business Management?

Faculty Teaching and Student Learning Assessment by patron group type, item 901 and 902, and associated activities?

Business Office Accounting and budgeting by subject category?

Other processes?

Query Submitted to the Search Box on University Website Retried Info on People, Events, Curriculum, etc.: Algebra Electronic Data Processing, Combinatorics, Henry George, Wankel, Charles, etc . (19 of 20)

Subject Access Enhancement FocusOnSearch and CategoryMap (20 of 20)

Browse both print and electronic collections on Algebra -Electronic Data Processing and Mathematics by LC classification scheme with a single click

Enable a single measurement point to benchmark processes on university resources and library

Integrate one or more category maps by classifying university resources and library consistently

Enable trend analysis for collection development needs on Combinatorics or Henry George?

Enable repackaging and unbundling of resources by fine-grained topics

Answer questions like To whom will the collection serve, e.g. for which school program, instructor, courses, etc.

How well does the collection meet the need of faculty and at what cost?

22

22

Business Scenario for FocusOn Search and CategoryMap (1 of 3)

23

The diagram depicts a snapshot of the information infrastructure for the University resources, especially in regard to faculty, and Libraries;

FocusOn Search and CategoryMap sit on top of the information discovery layer, building the bridges, e.g. among faculty, university resources and libraries;

Enable us to understand who the users are, and what processes involved in info creation and consumption especially in regard to faculty;

More category types to markup faculty activities, university resources and the library?

What are considered as input, what are considered as output? What are the processes to generate the output? How information flows between each process?

This diagram details facts to collect and markup at contextual level.

23

Business Scenario for FocusOn Search and CategoryMap (2 of 3)

24

This diagram indicates the flows of the systems. We aggregate contents through the aggregation of technologies, and distribute the contents to users.

Librarians deploy systems, such as Collection Development, Cataloging, LibGuide, capable to select, organize, access, guide, enhance, and distribute contents to the user through technologies. Yet, there are still complaints .

Where is the users behavior context? we index tons of info, present them to the user without any filtering, e.g. who are the users, and what are they looking for?

24

Business Scenario for FocusOn Search and CategoryMap (3 of 3)

25

At document side, if we have a CategoryMap, it will:

Lookup and consume vocabulary services provided by LC, NLM, OCLC, and Getty in manual and batch modes;

Process vocabulary and enable the choice of the appropriate form of named entity in reference to terms clustered by applications, tagged by end-users, structured in classification scheme;

Distribute the contents to the end-users through the analysis of existing collections, activities and users;

Classify the users behavior context better?

25

System Front End for FocusOn Search and CategoryMap (1 of 6)

26

There are objects to be embedded within the front end of FocusOn Search and CategoryMap. The objects being selected for insertion in a word document are:

St. Johns Logo: Login/Create My Account; User preferences; Simple search and advanced search modes; Suggest; Reset; Email; Print; AskUs; Exit.

The Preview button is expected to view full-text of Search results selected when limiting to online only, etc.

Refinement search results by subject, and then limit the subject to concept only.

Click browse CategoryMap, relationship among highlighted subject terms about the person can be explored from OCLC Named Entities for the person.

St. Johns FocusOn Search As Google Gadget.

TextThis is the button to send a few and final result sets to mobile phone, mocked up from North Caroline State Universitys Quick Search: http://www.lib.ncsu.edu/catalog/

The button Save means Save To Bag for further processing.

After Save to Bag, users have the choice of saving the items into my library. The list of Add Note, Edit labels, Write review, and Remove will appear in brief item listing display. Two selected books in users library are selected for such display, extracted and mocked up from Google Books.

The label Add note applies to the entire banner of the 1st book in brief display.

The label Write review applies to the entire banner of 2nd book in brief display.

User created labels will be indexed by CategoryMap.

Two trails of bread crumbs for folder navigation are designed to integrate FocusOn Search with existing Websites of the University and Libraries.

The top one sits right above the user actions for Print, Attach/RSS, Libraries, Text This, Reformat and Gadget. It indicates users paths, e.g. Home > Academics & School > Libraries > Resources > Focus On > About Henry George. Click on the trail will lead users go back to the next higher level of the folder structured trail.

The second trail in the bottom of the page indicates available services provided by the University, including feedback, privacy, safety, sitemap, and copyright information, etc. Click on the trail will lead users into the services provided by the University and the Libraries.

26

27

Systems Front End for FocusOn Search and CategoryMap -

Record Validation Configuration for Atom Data Feed Consumption in Voyager Cataloging Client (2 of 6)

Build CategoryMap into the session configuration for existing cataloging client whether it is browser-based or window-based for a single user. Validation of content and record structure within CategoryMap. The example shows, how record structure such as Atom and Dublin Core can be accommodated and validated in such environment, including heading types, e.g. category, etc.

27

Systems Front End for FocusOn Search and CategoryMap - Network Connection Configuration to Various DBs for Data Feed Consumption from Bibliographic Utilities, NAFs, to Content Management Systems, etc. in Voyager Cataloging Client (3 of 6)

28

Client configurable CategoryMap Connection Options to consume data services from a list of databases, e.g. WorldCat, LC authority files, NLM Mesh, Getty AAT, NLC Authority file, dictionary, and common used reference tool, etc.

28

Systems Front End for FocusOn Search and CategoryMap General Import Profile Configuration for Atom Data Feed Consumption in Voyager Cataloging Client (4 of 6)

29

Build CategoryMap into the session configuration for general holdings library, including choice of call no. hierarchies, import and duplicated profiles, etc.

29

Systems Front End for FocusOn Search and CategoryMap Template Configuration for Constant Holdings Data in Voyager Cataloging Client (5 of 6)

30

Build CategoryMap into the session configuration for format specific holdings library if MARC format is chosen

30

Systems Front End for FocusOn Search and CategoryMap Template Configuration for Constant Item-level Data and Category Term (6 of 6)

31

Build CategoryMap into the session configuration for format specific item in cataloging client, where item level category is displayed as category code, e.g. 900.

Build CategoryMap into the session configuration for format specific item in circulation client where item level category is displayed as category name description, e.g. Management - Tobin.

31

System Backend for FocusOn Search and CategoryMap (Taxonomy Management Module) (1 of 2)

FocusOn Search application packages entail a stack of services:

Centralized catalog

Handle media types in the catalog

Named entities Person, family, and corporate be linked and mashed up for obtaining the aboutness and of-ness of a person, locally and remotely via public available APIs on top of HTTP and/ ESBs within the private cloud computing network;

Other entities , e.g. concept, object, event, and geographic name

Search facility - suggest spelling correction based on patterns, rules, keywords, phonics, synonyms, dictionary, and controlled vocabulary within one dialogue box in a single interface. It will also suggest categories that would facilitate discovery based on statistical analysis of queries, documents, user profiles and activities, usage, and vocabulary services consumed from other vocabulary service providers

Google Map API for geographic name

32

Centralized catalog:

1. Is part of the common service of the discovery layer, sitting on top of existing university information resources and Libraries on the Web, ILS (Integrated Library Systems), university resource planning systems (enterprise legacy systems), teaching and learning systems, and discipline-specific research repositories at institutional and regional level once the systems implemented in full-scale;

2. Provides interfaces for human-machine and machine-machine communication, interaction, collaboration, problem solving, and decision support;

3. Provides an inventory of structured data (xml, RSS, atom) and unstructured data (email, web page, .doc, .pdf, .excel) via a set of meta-data records. A meta-data record conformed to the institutional and industry standards describe the of-ness and about-ness of an information object and provide links to the object.

Media Type:

All media types in the catalog will be given descriptive meta-data for media type identification, discovery, search and retrieval, and linkage.

1. Like the rest of the collections in the catalog, they are classified for role-based access, arranged alphabetically for browsing, categorized for discovery, filtered, ETL and indexed for search and retrieval, recommended for reputation, top-ranked for analysis and other processes in the pipeline, and linked for obtaining the media object locally or mashing up with external applications remotely via public available APIs on top of HTTP and enterprise service bus within the private cloud computing environment.

2. The administrative and structural metadata for the maintenance and manipulation of each media type (e.g. reformatting images, videos, and audios) as a media object is beyond the scope of this project at the moment.

NAMED ENTITIES

The named entity for a person, family, and corporate is considered as an information object that comes with the following attributes when appropriate:

Zip-code, address, country;

Area code, phone number, device profiles, etc.;

Web page and email in the form of URI;

Language;

Timeline that is specific to a named entity. For a person, timeline refers to dates associated with the persons birth date, death date, and period of activity in Gregorian calendar;

Category appropriate to the level of granularity of the information object, e.g. skills and specialty for a person, and correlated with:

subject terms clustered by an application;

controlled vocabulary such as LCSH and MESH provided by a lookup;

user-tagged terms;

classification scheme such as LC classification and Dewey;

Association related to the about-ness of a named entity.

For a person, the associated attributes are not limited to the followings, e.g. title, gender, affiliation, field of activity, occupation and biographical information. At runtime, a search of the named entity of a person, all resources, works, expressions, manifestations and items about the named entity will be retrieved and displayed along with the bio info of the person;

Association related to the of-ness of a named entity. At runtime, a search of the named entity of a person, all works, expressions, manifestations, and items created by the named entity will be retrieved and displayed based on content model for rendering;

Relationships between named entities for persons, families, and corporate bodies are tagged, mapped, grouped, and visualized according to user-tagged terms, association rules, classification, and user profiles specified in web form during initial registration. A user can also modify such relationship manually. The backend systems will recommend additional relationships by running a recommendation engine on behalf of the user;

Top-ranked for other processes in the pipeline, e.g. supporting collection development decision, users and collection performance analysis, e.g. query expansion;

Like media type, the specific named entity, e.g. person, will be linked and mashed up for obtaining the aboutness and of-ness of a person, locally and remotely via public available APIs on top of HTTP and ESBs within the private cloud computing network;

Privacy, copyright, and information security, including opt-in and opt-out option for the named entities to be exposed and shared across the enterprise;

The output of the focused page can also be rendered for import and export, RSS, preview, citation list generation, sharing, printing, email and texting in user-defined formats and devices.

Other entities such as concept term, object name, event name, and geographic name will carry similar system functionality and capability as the named entities for persons, families, and corporate bodies. At run-time, given a concept term, for instance, works, expression, manifestations, and items related to the concept term will be retrieved and displayed regardless of its structure, media type, format, repository, etc. according to the classification of the documents, controlled vocabulary, role-based access, and content models for rendering.

At run-time, the relationship between the concept term, for instance, and its broader terms, narrower terms, used terms, etc. can be exposed and consumed by other applications, which might take it as an input for making choices and validation of the form of a name or subject, assigning classification and subject terms to the resources, in addition to the development and maintenance of the vocabulary for categories.

The search facility in FocusOn Search will suggest spelling correction based on patterns, rules, keywords, phonics, synonyms, dictionary, and controlled vocabulary within one dialogue box in a single interface. It will also suggest categories that would facilitate discovery based on statistical analysis of queries, documents, user profiles and activities, usage, and vocabulary services consumed from other vocabulary service providers.

For geographic name, if applicable, zip code and area code processing will be a part of the application. Ideally, Google Map API look up should be supported as well if applicable.

32

System Backend for FocusOn Search and CategoryMap (Taxonomy Management Module) (2 of 2)

Link user services, collection management, circulation, acquisitions, cataloging, and other processes across the units of Library and University Resources

Maintain taxonomy in conformance to institution and industry standards

The CategoryMap will manage category terms which can be in a form of concept, object, event and place, harmonized from subject terms:

Clustered by an application;

Looked up through controlled vocabulary such as LCSH, MESH, and AAT;

Tagged by user-defined terms;

Structured by LC and Dewey classification;

Referenced directly from fund expenditure structure in acquisitions;

Analyzed based on usage statistics reports aggregated from circulation, content suppliers, etc., and no. of documents/objects likely carrying the category term;

Managed in a knowledge base for vocabulary filtering, mapping, ETL, etc., and in a data warehouse for data mining;

The search facility will also handle query processing in relational database management systems and ontological database management systems;

Relationships between concepts, objects, events, and geographic names are constructed according to controlled vocabularies developed by LC, NLM, and Getty.

33

Fine-grained taxonomy management is important for not only for subject searches, but also for mission critical operations at the University and Libraries. For Libraries, e.g. it is important to make informed decisions as what we are doing and how well we are doing through baselining and reporting on user services, collection management, circulation, acquisitions, cataloging, etc. The CategoryMap application and along with its program will link these processes across the units of the Libraries, and the University.

Therefore, it is our job to maintain such taxonomy for the reuse and sharing of enterprise-wide information resources among ERP systems, ILS, institutional repositories, etc. in conformance to institution and industry standards. The CategoryMap will serve as the backbone of an enterprises common data services, in addition to the time of the day and locations.

The CategoryMap will manage category terms which can be in a form of concept, object, event and place, harmonized from subject terms:

Clustered by an application;

Looked up through controlled vocabulary such as LCSH, MESH, and AAT;

Tagged by user-defined terms;

Structured by LC and Dewey classification;

Referenced directly from fund expenditure structure in acquisitions;

Analyzed based on usage statistics reports aggregated from circulation, content suppliers, etc., and no. of documents/objects likely carrying the category term;

Managed in a knowledge base for vocabulary filtering, mapping, ETL, etc., and in a data warehouse for data mining;

The search facility will also handle query processing in relational database management systems and ontological database management systems;

Relationships between concepts, objects, events, and geographic names are constructed according to controlled vocabularies developed by LC, NLM, and Getty.

All named entities such as personal name (PN), family name (FN), corporate name (CN), concept term (CT), object name (ON), event name (EN), geographic name (GN), and timeline (TN) in a meta-data record will have their own authority records stored and maintained centrally in a logical/physical name resolver facility distributed globally by authorized vocabulary service providers such as LC, OCLC, British Library, and National Library of Canada on the Web.

Named headings in the authority records at the name resolver facility such as OCLC WorldCat are:

Constructed in conformance to tagging standards and rules;

Contributed by a community of users who have defined their roles and responsibilities in service contribution and consumption, registered and exposed their services with major vocabulary service providers;

Validated by templates, encoding levels, schemas, name authority files, controlled vocabularies, reference tools, and business rules;

Governed for the enforcement of policies, service level agreements (SLAs), operational level agreements (OLAs), service reconciliation, service lifecycle management, compliance, SSO (Single Sign On), etc.;

Monitored, measured and reported for information quality, fiduciary, and security.

The CategoryMap application will perform dynamic lookup or batch processing for named entities and subjects in a name resolver facility via Web-services for service consumption. User-tagged terms in such a manner will be reviewed, card-sorted, and integrated into a master list of commonly used vocabulary before they are contributed to the vocabulary service providers when appropriate.

The application will map a user-tagged term for the object into its variant name, preferred form of name, and default form of name as appropriate to the users choice according to statistical processing and tag-based ranking algorithms, and others.

See references for information criteria defined by COBIT Conceptual Framework, and ISACA Model Curriculum.

33

34

FocusOn Search and CategoryMap in Distributed Network/Web (Logical Network Diagram)

There are two tiers: 1) Cloud tier user processes on the internet (OS for Browser); 2) Vocabulary tier document processes on the intranet (OS for Windows);

Sync desktop application from both tiers;

34

35

Data Flow Diagram Context Level for FocusOn Search

DFD (Data Flow Diagram) Context Level for FocusOnSearch

35

ER Diagram Adapted from RDA (Resource Description and Access) - ER Diagram View of Title and FRAD Named Entity in Authority Control by IMT, 2008 (1 of 11 )

36

Reference: ER Diagram for RDA Taxonomy: High-Level Relationship Among Entities by IMT (Information Management Team)

1. Uncontrolled access point, explanatory heading, community generated tags, etc. excluded from the diagram

36

ER Diagram Adapted from RDA (Resource Description and Access) Instance View of Mocked-up Named Entity for Personal Name in MARC Format in LC NAF (2 of 11)

37

|e rda

Example of the named entity - Person: George, Henry, 1839-1897 using LC Authority File

37

38

ER Diagram Adapted from RDA (Resource Description and Access) Instance View of Personal Name as Subject Access Point in LC Catalog (3 of 11)

Example of books about Henry George marked up in MARC 600 field. The personal heading has been established in LC Authority File.

38

39

ER Diagram Adapted from RDA (Resource Description and Access) ER Diagram View of WEMI, Named Entities, and Subjects by IMT , 2008 (4 of 11)

This ER Diagram indicates entity relationship among named entities and subjects (e.g. concept, object, event, place).

Reference: ER Diagram for RDA Taxonomy: High-Level Relationship Among Entities by IMT (Information Management Team) (4 of 8)

39

40

ER Diagram Adapted from RDA (Resource Description and Access) Schema View of RDA Record, FRBR WEMI, RDA Entities by IMT , 2009 (5 of 11)

ER Diagram Adapted from RDA (Resource Description and Access) Instance View of Related Topical Headings in LC Subject Authority File (6 of 11)

41

LC subject authority indicates relationship between topical headings Single tax, Land, Nationalization of, etc.

41

ER Diagram Adapted from RDA (Resource Description and Access) Instance View of Mocked-up & Related Topical Headings in MARC Format in LC Subject Authority File (7 of 11)

42

|e rda

Here is how it is marked up in the authority file.

42

43

ER Diagram Adapted from RDA (Resource Description and Access) ER Diagram View of Person as Named Entity by IMT, 2009 (8 of 11)

Reference: ER Diagram for RDA Taxonomy: High-Level Relationship Among Entities by IMT (Information Management Team) (7 of 8)

43

44

ER Diagram Adapted from RDA (Resource Description and Access) Schema View of Person as Named Entity by IMT, 2009 (9 of 11)

44

ER Diagram Adapted from RDA (Resource Description and Access) - Instance View of Personal Name as Refined Subject Access Point in LC Catalog (10 of 11)

45

Refine search by subject

Here is refined search by subject.

45

ER Diagram Adapted from RDA (Resource Description and Access) - Instance View of Personal Name as Refined Subject Access Point in LC Catalog (11 of 11)

46

46

47

System Flow Chart for the Discovery Layer of FocusOn Search and CategoryMap (1 of 2)

System Flow Chart for FocusOn Search and CategoryMap

Info Sharing Processes for Enterprise Wide Information Discovery

47

48

System Flow Chart for FocusOn Search and CategoryMap (2 of 2)

The CategoryMap has to leverage the vocabulary framework such as Topic Map as formal taxonomy building block, which sits on top of commonly thesauri such as LC LCSH, NLM MESH, and Getty AAT, and in addition, it presents the topic map and other vocabulary processing features for FocusOn Search in the discovery layer. On the one hand, we will leverage existing vocabulary framework such as OCLC WorldCat Identities by developing service consumption applications, and on the other hand, we have to actively collaborate with others in developing the common vocabulary infrastructure for the Web.

48

49

System Flow Chart for the Data Movement of all Vocabularies

1. Info Sharing Processes for Enterprise Wide Information Discovery

2. Maintain, Trace, Track, Analyze, Report

49

Suggestions for Future

Expand Content Selection to Unstructured Data on the Web

Leverage Named Entities Resolution Services Provided by OCLC WorldCat

Build Data Filters for Media and File Types

Build a Plug-in Reformat Utility

Build a Plug-in Meta-data Conversion Utility

Evaluate Change Management strategies, packages and techniques

50

Continue to collect sample unstructured source data at St. Johns Univ. Web Site from the faculty page of Tobin College of Business like Dr. Charles Wankel, and integrate the page using CategoryMap application that is going to integrate into the Discovery Layer for FocusOn Search.

Continue to collect sample unstructured source data composed by a group of librarians as the libraries guides to the events of current and future interest, and published at St. Johns University Web site like one of the Topic Guides Titled Focus on Henry George.

Continue to collect sample unstructured source data from OCLC WorldCat Identities Services for Named Entities Resolution using LCCN number as identifier to locate the personal name page for Dr. Charles Wankle.

Continue to collect sample structured data to be syndicated from Google Books by FocusOn Search using Henry Georges Dreamer or Realist as use cases for developing detailed display of an selected item in the Front End of FocusOn Search

To syndicate data feed from the university resources and Libraries on the Web, Attach button would allow the system to obtain HTML pages and their associated files (e.g. PDF, Excel, Word, etc.) from sites recommended by the discovery layer of the FocusOn Search. The file filtering layer prebuilt within the FocusOn Search will automatically convert the native pages into format-independent files, ready to be reviewed, ETL (Extracted, Transformed, and Loaded), and integrated with the repositories of FocusOn Search.

A plug-in meta-data conversion utility will capture the attached metadata and convert them into a centralized meta-data repository for the entire discovery layer, ready to be reused by other applications.

The RSS button is going to store dynamic contents on the web. Special change management strategies, packages, and techniques have been evaluated, e.g. Rational Asset Manger, for SOA services, etc.

Reformat is an export facility that presents users with choices for output options of further processing, e.g. RefWork.

All the cataloged resources are expected to have zip code lookup function, and would be interfaced with Google Map, and localized as how the systems behaved in OCLC Open WorldCat http://www.worldcat.org/. Such visualized features are expected to be performed after final refinement.

Two sample result sets indicate that the discovery layer of the FocusOn Search will send open API requests to a list of service providers, dynamically determining the appropriate copy to present if there are multiple choices, the appropriate format template to use for rendering based on criteria of the followings:

a) predefined by the users,

b) pre-processed open URL links according to known contracts, service level agreement and trust management,

c) patterns, heuristic rules, statistical analysis, and data mining of resources, users, activities, etc. in the data warehouse and the knowledge-based of the discovery layer.

50

References

51

Duggan, J., & Stang, D. B. (2008). Magic quadrant for software change and configuration management for distributed platforms, 2008. Gartner RAS Core Research Note, G00153962, 1-10.

Hoffer, J. A., George, J. F., & Valacich, J. S. (2008). Modern systems analysis and design (5th ed. ed., pp. 130-159). New Jersey: Pearson Prentice Hall.

IMT (Information Management Team). ER Diagram for RDA Taxonomy: High-Level Relationship Among Entities. Available: http://www.rdaonline.org/ERDiagramRDA_24June2008.pdf

Inmon, W.H. Architecting for Business Intelligence and Data Warehousing: Integrating the Structured and Unstructured Data World. Data Warehouse Seminar 08, sponsored by Data Management Forum, Dec. 8, 2008

Xu, Amanda (2000). Beyond Seamless Access: Meta-data in the Age of Content Integration presented and led the discussion forum at the Spring Program, Information Technology Interest Group of ACRL, New England Chapter, Univ. of Connecticut, May 26, 2000.

Xu, Amanda (2007). Mending the Gap Between the Librarys Electronic and Print Collections on Librarys Web Site Using Semantic Web Presented for ExLibris Voyager End User Group Meeting, Chicago, Ill, April 19-20, 2007.

Joint Steering Committee for Development of RDA. RDA Element Analysis. 26 Oct. 2008: http://www.collectionscanada.gc.ca/jsc/docs/5rda-elementanalysisrev2.pdf

51

Service

Management

Policy

Management

Governance Automation

Dedicated Private

Cloud Computing

Security

uPortals

libPortals

Page group

VocabularyPortals

Meta-data

Portals

Service Registry/Repository

Compliance Validation

Security

Meta-data

Federation

Unstructured

Text

Unstructured

Media

CategoryMap

User Profiles

Rendering Models

Discipline Preferences

Directory Services/

SSOS

Service

Virtualization

Trust and Mediation

Management

Change Management &

Service Life Cycle

Management

Checkpoint

Certified

Checkpoint

CMMI Level 5

CMMI Level 5

COBIT&ISACA Framework

COBIT & ISACA 1 in.

CheckPoint

Checkpoint

Service Consumer

Contract Provisioning

Service Consumer

Contract Provisioning

Security

FocusOn Search

Security

Directory Services/SSOS

Service Providers

Page

Security

Security

Suggest

Page group

Select oval andtype. Control handles change width & height of oval.

Page

Dialog

Dedicated Private Cloud Computing

uPortals

libPortals

VocabularyPortals

Meta-data Portals

Security

Service Management

Search FocusOn

Start

Suggest

Inventory

Display

result list

Display

Result

List

Return

too many

hits?

Yes

Select Search

Refinement

Options

No

Select Item

for brief

display

Refine

Search

By

Sources

Refine

Search

By

Subject

Refine

Search

By

Media

Refine

Subject

Search By

Topic

Refine

Subject

Search By

Object

Refine

Subject

Search By

Geographic

Name

Zip codes

Lookup

Postal

Office

Code

Refine

Subject

Search

By

Person

Refine

Subject

Search By

Event/Activity

Persons

Lookup

NAF

Concepts

Lookup

Controlled

Vocabulary

Places

Lookup

Controlled

Vocabulary

Lookup

Controlled

Vocabulary

Events

Lookup

Controlled

Voabulary

Ojects

Refine

Subject

Search by

Time Period

Periods

LC

Authorities

WorldCat

Identities

NLM

British

Library

(U.K)

NLC (Canada)

Getty

UMLS

Common Ref

ToolKit

Refine

Search by

Category

Biography

News/TV

Network

Check User

Profiles

Check

Rendering

Models

Internal

Storage

Check User-

Defined

Category

Aggregated

FocusOn

topics by

CategoryMap

Aggregated

by FocusOn

disciplines

Agregated

FocusOn By

Disciplines

Postal

Office

Basic Flow Chart Legend

Process Symbol: Level 0 Red

Process Symbol: Level 1 Purple

Process Symbol: Level 2 Blue

Process Symbol: Level 3 Pink

Topic

Map

Search FocusOn

Select Search Refinement Options

Start Suggest

Inventory

Display result list

Display Result List

Return too many hits?

Yes

Biography

Select Item for brief display

No

Refine Search By Sources

Refine Search By Subject

Refine Search By Media

Refine Subject Search By Topic

Refine Subject Search By Object

Lookup Controlled Vocabulary

Refine Subject Search By Geographic Name

Lookup Postal Office Code

Refine Subject Search By Person

Zip codes

WorldCat Identities

Refine Subject Search By Event/Activity

Persons

Lookup NAF

Concepts

Lookup Controlled Vocabulary

Places

Lookup Controlled Vocabulary

Events

Lookup Controlled Voabulary

Ojects

Refine Subject Search by Time Period

Periods

Aggregated by FocusOn disciplines

LC Authorities

NLM

Aggregated FocusOn topics by CategoryMap

British Library (U.K)

Getty

NLC (Canada)

UMLS

Common Ref ToolKit

Refine Search by Category

News/TV Network

Check User Profiles

Check Rendering Models

Agregated FocusOn By Disciplines

Internal Storage

Check User-Defined Category

Basic Flow Chart Legend Process Symbol: Level 0 RedProcess Symbol: Level 1 PurpleProcess Symbol: Level 2 BlueProcess Symbol: Level 3 Pink

Postal Office

Topic Map