42
Managing Information Resources

Managing Information Resources - Simon Fraser University · e.g. structured data in DB Information ... Software Engineering; 79. 10021; Jack. ... Structured data: Unstructured data:

Embed Size (px)

Citation preview

Managing Information Resources

1

3

2 Managing Information

Managing Contents

Managing Data

Concepts & Definitions

DataFacts devoid of meaning or intente.g. structured data in DB

InformationData that has meaning (data in context)e.g. course selection info in a student management system, documents, voice, video...Knowledge: information with direction or intent

ContentTerm for the Web agee.g. text, graphics, animation, maps, photos, film clips etc.

Information Resource Management Responsibilities

Corporate databasesDistributedVarious data modelsData warehouse

InformationDocumentsWeb contents

Knowledge managementExplicit knowledge (know-what)Tacit knowledge (know-how)

IS has been continually managing new forms of information resources

Managing Data

DBMS

The three-level database modelLevel 1: the conceptual level

Containing the various "user views" of the corporate data that each application program uses

Level 2: the logical levelLogical views of an organizations data as under the control of the DBAs

Level 3: the physical levelSpecifying the way the data is physically stored

Level 2 absorbs changes made at level 3

Stuent ID Student name Course Score10021 Jack Software Engineering 7910021 Jack Data structure 7610022 James Software Engineering 8510022 James Data structure 88

StuID StuName Age10021 Jack 2110022 James 20

CourseID CourseName Capacity Room373 Software engineering 30 AQ5018275 Data structure 40 AQ3023

StuID CourseID Score10021 373 7910022 373 8510021 275 7610022 275 88

Level 1

Level 2Table Student

Table Course

Table CourseSelect

Level 3

Four Data Models

Hierarchical modestructures data so that each element is subordinate to another in a strict hierarchical manner (Parent & child)

Network modelAllows each data item to have more than one parent, Relationships stated by pointers stored with the data

Relational modelObject model

Storing and managing data as objectsA competitive candidate for storing XML data

XML

XML (eXtensible Markup Language) is a self-describing markup language for applying structure to data

Not limited to predefined tagsHuman readableMachine readable

PortabilityJava: portable programsXML: portable data

XML---Semi-Structured Data

TEXT

Structured(relational)

Data

XMLLessStructure

MoreStructure

Structured data:

Unstructured data:

XML Data Model & Native Storage

“two...”

imdb

show

title review“Fugitive, The”

review

suntimes

reviewer rating

nyt

“Roger Ebert” “gives”

@year“1993”

… …

Native XML DatabaseDefines a (logical) model for an XML document and stores and retrieves documents according to that model.Has an XML document as its fundamental unit of (logical) storage

Getting Corporate Data into Shape (1)

The Problem: Management can not get consistent view across the enterprise

1960s-1970: application developed in separation "information islands"Different units in an organization developed their used their own database and their own applications

Inconsistent data definitionsDuplicate data

Getting Corporate Data into Shape (2)

The Cause: an application-driven approachGetting applications running as quickly as possible

The Solution: a data-driven approachData of interest data source applicationsUsually evolves from the application-driven chaos

Getting Corporate Data into Shape (3)

Managing data as a corporate resource is more than installing a DBMS

DBA: administering databases and software that manages themData administrator: managing enterprise-wide data resources

Clean up the data definitionsControl shared dataManage data distribution, and Maintain data quality

Getting Corporate Data into Shape (4)

ERPs aim to integrate all data and processes of an organization into a unified system

Automate and integrate the majority of business processesShare common data and practices across the entire enterpriseProduce, access and manage information in a real-time environmentConfigure application to meet business needs

Key: a unified databaseProvide management a corporate-wide view of operations

1

3

2 Managing Information

Managing Contents

Managing Data

Four Types of Information (1)

Two structures of information

Record-based: facts about entitiesDocument-based: dealing with concepts

Housed in documents, messages, video, audio clips...

Four Types of Information (2)

Two sources of information: internal and external

Internal record-based information: traditional focus of ISExternal record-based information: public DBInternal and external document-based information have received little attention from IS until recently

However, it is estimated that 90% of an organization's information is in documents rather than structured databases (Sprague, 1995).

Technologies for Managing Information

The two different structures of information are managed in different ways

Record-basedData warehouse

Document-basedDocument management systemsWeb content management

What is Data Warehouse?

“A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatilecollection of data in support of management’s decision-making process.”—W. H. Inmon

Data Warehouse—Subject-Oriented

Organized around major subjects, such as customer, product, salesFocusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processingProvide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process

Data Warehouse—Integrated

Constructed by integrating multiple, heterogeneous data sources

Relational databases, flat files, on-line transaction records

Data cleaning and data integration techniques are applied.

Naming conventions, encoding structures, attribute measures, etc. among different data sourcesWhen data is moved to the warehouse, it is converted.

Data Warehouse—Time Variant

The time horizon for the data warehouse is significantly longer than that of operational systems

Operational database: current value dataData warehouse data: provide information from a historical perspective (e.g., past 5-10 years)

Every key structure in the data warehouseContains an element of time, explicitly or implicitly

Data Warehouse—Nonvolatile

A physically separate store of data transformed from the operational environmentOperational update of data does not occur in the data warehouse environment

Does not require transaction processing, recovery, and concurrency control mechanismsRequires only two operations in data accessing:

Initial loading of data and access of data

Data Warehouse vs. Heterogeneous DBMS

Traditional heterogeneous DB integration: A query driven approach

Build wrappers/mediators on top of heterogeneous databases When a query is posed to a client site, a meta-dictionary is used to translate the query into queries appropriate for individual heterogeneous sites involved, and the results are integrated into a global answer setComplex information filtering, compete for resources

Data warehouse: update-driven, high performanceInformation from heterogeneous sources is integrated in advance and stored in warehouses for direct query and analysis

Data Warehouse vs. Operational DBMS

OLTP (on-line transaction processing)Major task of traditional relational DBMSDay-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc.

OLAP (on-line analytical processing)Major task of data warehouse systemData analysis and decision making

OLTP vs. OLAP OLTP OLAP users clerk, IT professional knowledge worker function day to day operations decision support DB design application-oriented subject-oriented data current, up-to-date

detailed, flat relational isolated

historical, summarized, multidimensional integrated, consolidated

usage repetitive ad-hoc access read/write

index/hash on prim. key lots of scans

unit of work short, simple transaction complex query # records accessed tens millions #users thousands hundreds DB size 100MB-GB 100GB-TB metric transaction throughput query throughput, response

Typical OLAP Operations

Roll up (drill-up): summarize dataBy climbing up hierarchy or by dimension reduction

Drill down (roll down): reverse of roll-upFrom higher level summary to lower level summary or detailed data, or introducing new dimensions

Slice and dice: project and select Pivot (rotate):

Reorient the cube, visualization, 3D to series of 2D planes

Data Warehouse: A Multi-Tiered Architecture

28

DataWarehouse

ExtractTransformLoadRefresh

OLAP Engine

AnalysisQueryReportsData mining

Monitor&

IntegratorMetadata

Data Sources Front-End Tools

Serve

Data Marts

Operational DBs

Othersources

Data Storage

OLAP Server

Document Management

Estimated that 90% of an organization’s information is in documents rather than structured databases

Types of DocumentsContracts and AgreementsReportsManuals and HandbooksCorrespondenceMemosDrawings and Blueprints…

Fundamental Roles of Documents

4 Fundamental roles of documentsAs a product, or support for a productAs a fundamental mechanism for communicationamong people and groups within an organization and between organizations.As the primary vehicle for business processesAs an important part of organizational memory

Electronic Document

An electronic document has the following characteristics

holds information of multiple media: text, graphics, audio, videocontains multiple structures: headers, footers, TOC, sections, paragraphs, tablesdynamic: can be updated on the flymay depend on other documents

Limitations of RDBMS

Limitations of RDBMS for document management

Based on E-R data modelsSuitable for structured dataTraditional business applications, decision support systems, reporting toolsNo inherent support to manage electronic documents

Electronic Document Management System (1)

An EDMS is a computer system used to track and store electronic documents and/or images of paper documents.

Allows users to create a document or capture a hard copy in electronic formCommonly provided capabilities

StorageVersioningMetadataSecurityIndexingRetrieval

Electronic Document Management System (2)

Records created & received

electronicallyRecords created & received in

hard copy

Records are filed & managed for access & maintenance electronically

Electronic Document Management System (3)

An EDMS usually provides a single view of multiple databasesAn EDMS may include:

Scanners and Optical Character Recognition (OCR) for document capturePrinters for creating hard copiesStorage devices such as redundant array of independent disks systems and computer server Server programs for managing the databases that contain the documents.

1

3

2 Managing Information

Managing Contents

Managing Data

Content Management (1)

Content is a core management discipline underlying online business

Without production-level Web content, management processes, and technologies, large-scale e-business is not possibleThe adoption of the XML

The language for manipulating the content to work with transaction applications

Content Management (2)

Traditional “home-grown“ content management

The Webmaster was the publishing bottle neck

3 phases of content management life cycle

Input-process-output

Content Management (3)

Content creation and acquisitionFocus on creating content qualityDistribute content creation and maintenance to business departments with centralized coordination and control

Content administration and safeguardingEmphasis on efficiencyUse tools for content administration and work flow control

Content Management (4)

Content deployment and presentationEmphasis on effectiveness

i.e. Presenting the content so that it attracts visitors, allows them to navigate the site easily, and leads them to the desired actions

Features to attract and keep visitorsPersonalization: allowing visitors to customize how they view the pageLocalization: tailoring a site to a culture, market or localeMultichannel distribution: appropriate display for various devices

Content Management Systems (1)

A Content Management System (CMS) is software that makes it easier to create, edit and publish content on a web site.

Back-end to help create, edit and manage contentFront-end to deliver content dynamically to various endpointsWork flow control in moving and adding contents

Content Management Systems (2)

Content Delivery Application

Content Management Application

Content Delivery

Assembled, tagged & formatted assets

“Front-end" functions for delivering and displaying content

“Back-end” functions for creating, editing, producing, and administering a site and its content

•Databases•DB Schemas•XML, HTML•Web Services

•Portals•Web apps•PeopleSoft•MBM

•Docs, ppts•Brochures•Photos•Logos•Contracts•Syllabus•Schedule•C:\St

ruct

ured

Con

tent

Uns

truc

ture

d C

onte

nt

Content RepositoriesIndividual Contributors

Workflow