Upload
augustus-lawrence
View
218
Download
0
Embed Size (px)
Citation preview
Data ManagementInformation Management
Knowledge Management for Network Centric Operations
Dr. Bhavani ThuraisinghamThe University of Texas at Dallas
October 2005
204/21/23 01:27
Data, Information and Knowledge Management: Definitions
Information Management: Extracting information from the dataVisualizing the data
Data Management:Data administrationDatabase management
Knowledge Management:Acquiring knowledgeCollaboration and sharingManaging the processesDisseminating the knowledgeTaking action
304/21/23 01:27
What is data management?
0 One proposal: Data Management = Database System Management + Data Administration
0 Includes data analysis, data administration, database administration, auditing, data modeling, database system development, database application development
404/21/23 01:27
Data Administration
0 Identifying the data
- Data may be in files, paper, databases, etc.
0 Analyzing the data
- Is the data of good quality?
- Is the data complete?
0 Data standardization
- Should one standardize all the data elements and metadata?
- Repositories for handling semantic heterogeneity?
0 Data Security
- How should data be secured?
0 Data modeling
- Structure the data, model the data and the processes
504/21/23 01:27
Data Administration (Continued)
0 Data quality provides some measure for determining the accuracy of the data- Is the data current? Can we trust the source?- Data quality parameters can be passed from source to
source=E.g., Trust A 50% and Trust B 30%
0 Data may have different semantics- E.g, Bank A may send out statement on the 20th day of
each month and Bank B may send out statements on the 5th day of each month
- Fighter jet and Passenger plane may be considered to be one and the same
604/21/23 01:27
Data Administration (Concluded)
0 Data Standards- Standards for data semantics and administration- E.g., XML (eXtensible Markup Language) for
document interchange0 Data security includes data confidentiality and integrity
- Confidentiality is about preventing unauthorized access to the data
- Integrity is about preventing malicious corruption to the data
704/21/23 01:27
An Example Database System
Database
Database Management SystemApplicationPrograms
Users
804/21/23 01:27
Metadata
0 Metadata describes the data in the database
- Example: Database D consists of a relation EMP with attributes SS#, Name, and Salary
0 Metadatabase stores the metadata
- Could be physically stored with the database
0 Metadatabase may also store constraints and administrative information
0 Metadata is also referred to as the schema or data dictionary
904/21/23 01:27
Three-level Schema Architecture: Details
ExternalSchema A
ExternalSchema B
ConceptualSchema
InternalSchema
User A1 User A2 User A3 User B1 User B2
ExternalModel A
ExternalModel B
ConceptualModel
StoredDatabaseInternal Model
External/ConceptualMapping B
External/ConceptualMapping A
Conceptual/Internal Mapping
1004/21/23 01:27
Functional Architecture
User Interface Manager
QueryManager
Transaction Manager
Schema(Data Dictionary)Manager (metadata)
Security/IntegrityManager
FileManager
DiskManager
Data Management
Storage Management
1104/21/23 01:27
Types of Database Systems
0 Relational Database Systems
0 Distributed and Federated Database Systems
0 Object Database Systems
0 Deductive Database Systems
0 Other
- Real-time, Secure, Parallel, Scientific, Temporal, Wireless, Functional, Entity-Relationship, Sensor/Stream Database Systems, etc.
1204/21/23 01:27
Relational Database: Example
Relation S:
S# SNAME STATUS CITYS1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens
Relation P:
P# PNAME COLOR WEIGHT CITYP1 Nut Red 12 LondonP2 Bolt Green 17 ParisP3 Screw Blue 17 RomeP4 Screw Red 14 LondonP5 Cam Blue 12 ParisP6 Cog Red 19 London
Relation SP:
S# P# QTYS1 P1 300S1 P2 200S1 P3 400S1 P4 200S1 P5 100S1 P6 100S2 P1 300S2 P2 400S3 P2 200S4 P2 200S4 P4 300S4 P5 400
1304/21/23 01:27
Example Object
CompositeDocument
Object
Section 1Object
Section 2Object
Paragraph 1Object
Paragraph 2Object
1404/21/23 01:27
Distributed Database System
Communication NetworkDistributed Processor 1
DBMS 1
Data-base 1 Data-
base 3
Data-base 2 DBMS 2
DBMS 3
Distributed Processor 2
Distributed Processor 3
Site 1
Site 2
Site 3
1504/21/23 01:27
DBMS 1
DQP DQP
DBMS 2
DQP
DBMS 3
EMP1 (20) EMP2 (30)DEPT2 (20)
EMP1 (20)EMP3 (50)DEPT3 (30)
Network
Query at site 1: Join EMP and DEPT on D#
Move EMP2 to site 3; Merge EMP1, EMP2, EMP3 to form EMPMove DEPT2 to site 3; Merge DEPT2 and DEPT3 to form DEPTJoin EMP and DEPT; Move result to site 1
Query Processing ExampleDQP(DistributedQueryProcessor)
1604/21/23 01:27
Transaction Processing Example
Site 1Coordinator
Transaction Tj
Site 2Participant
Site 3Participant
Site 4Participant
Subtransaction Tj2 Subtransaction Tj3
Subtransaction Tj4
Issues:Concurrency controlRecoveryData Replication
Two-phase commit:Coordinator queries participants whether they are ready to commitIf all participants agree, then coordinator sends request forthe participants to commit
DTM (Distributed Transaction Manager) responsible for executing the distributedtransaction
1704/21/23 01:27
Interoperability of Heterogeneous Database Systems
Database System A Database System B
Network
Database System C(Legacy)
Transparent accessto heterogeneousdatabases - both usersand application programs;Query, Transactionprocessing
(Relational) (Object-Oriented)
1804/21/23 01:27
Technical Issues on the Interoperability of Heterogeneous Database Systems
0 Heterogeneity with respect to data models, schema, query processing, query languages, transaction management, semantics, integrity, and security policies
0 Interoperability based on client-server architectures
0 Federated database management
- Collection of cooperating, autonomous, and possibly heterogeneous component database systems, each belonging to one or more federations
1904/21/23 01:27
Different Data Models
Node A Node B
Database Database
RelationalModel
NetworkModel
Node C
Database
Object-Oriented Model
Network
Node D
Database
HierarchicalModel
Developments: Tools for interoperability; commercial productsChallenges: Global data model
2004/21/23 01:27
Schema Integration and Transformation: An approach
Schemadescribing
the networkdatabase
Schemadescribing
the hierarchicaldatabase
Schemadescribing
the object-orienteddatabase
Global Schema: Integratethe generic schemas
ExternalSchema I
External Schema II
External Schema III
Schemadescribing
the relationaldatabase
Generic schemadescribing
the relationaldatabase
Generic schemadescribing
the networkdatabase
Generic schemadescribing
the hierarchicaldatabase
Generic schemadescribing
the object-orienteddatabase
Challenges: Selecting appropriate generic representation; maintaining consistency during transformations;
2104/21/23 01:27
Semantic Heterogeneity0 Semantic heterogeneity occurs when there is a disagreement about
the meaning or interpretation of the same data; or same data interpreted differently
Object O
Node A Node B
Database Database
Object Ointerpreted as
a passenger ship
Object Ointerpreted asa submarine
Challenges:Standard definitions;Repositories
2204/21/23 01:27
Federated Database Management
Database System A Database System B
Database System C
Cooperating databasesystems yet maintainingsome degree ofautonomy
Federation F1
Federation F2
2304/21/23 01:27
Autonomy
Component A Component B
Component C
local request
request from component
communicationthrough
federation
component Adoes not
communicatewith
component C
component A honorsthe local request first
Challenges:Adapt techniques to handle autonomy -e.g., transactionprocessing, schema integration; transitionresearch to products
2404/21/23 01:27
Federated Data and Policy Management
ExportData/Policy
ComponentData/Policy for
Agency A
Data/Policy for Federation
ExportData/Policy
ComponentData/Policy for
Agency C
ComponentData/Policy for
Agency B
ExportData/Policy
2504/21/23 01:27
What is Information Management?
0 Information management essentially analyzes the data and makes sense out of the data
0 Several technologies have to work together for effective information management
- Data Warehousing: Extracting relevant data and putting this data into a repository for analysis
- Data Mining: Extracting information from the data previously unknown
- Multimedia: managing different media including text, images, video and audio
- Web: managing the databases and libraries on the web
2604/21/23 01:27
Data Warehouse
OracleDBMS forEmployees
SybaseDBMS forProjects
InformixDBMS forMedical
Data Warehouse:Data correlatingEmployees WithMedical Benefitsand Projects
Could beany DBMS; Usually based on the relational data model
UsersQuerythe Warehouse
2704/21/23 01:27
What is Data Mining?
Data MiningKnowledge Mining
Knowledge Discoveryin Databases
Data Archaeology
Data Dredging
Database MiningKnowledge Extraction
Data Pattern Processing
Information Harvesting
Siftware
The process of discovering meaningful new correlations, patterns, and trends by sifting through large amounts of data, often previously unknown, using pattern recognition technologies and statistical and mathematical techniques(Thuraisingham 1998)
2804/21/23 01:27
Steps to Data Mining
Data Sources
Integratedata sources
Clean/modifydata sources
Minethe data
ExamineResults/Pruneresults
Reportfinalresults/Take actions
2904/21/23 01:27
Data Mining Needs for Counterterrorism: Non-real-time Data Mining
0 Gather data from multiple sources
- Information on terrorist attacks: who, what, where, when, how
- Personal and business data: place of birth, ethnic origin, religion, education, work history, finances, criminal record, relatives, friends and associates, travel history, . . .
- Unstructured data: newspaper articles, video clips, speeches, emails, phone records, . . .
0 Integrate the data, build warehouses and federations
0 Develop profiles of terrorists, activities/threats
0 Mine the data to extract patterns of potential terrorists and predict future activities and targets
0 Find the “needle in the haystack” - suspicious needles?
0 Data integrity is important
0 Techniques have to SCALE
3004/21/23 01:27
Data Mining Needs for Counterterrorism: Real-time Data Mining
0 Nature of data
- Data arriving from sensors and other devices
=Continuous data streams
- Breaking news, video releases, satellite images
- Some critical data may also reside in caches
0 Rapidly sift through the data and discard unwanted data for later use and analysis (non-real-time data mining)
0 Data mining techniques need to meet timing constraints
0 Quality of service (QoS) tradeoffs among timeliness, precision and accuracy
0 Presentation of results, visualization, real-time alerts and triggers
3104/21/23 01:27
Data Mining as a Threat to Privacy
0 Data mining gives us “facts” that are not obvious to human analysts of the data
0 Can general trends across individuals be determined without revealing information about individuals?
0 Possible threats:- Combine collections of data and infer information that is private
=Disease information from prescription data=Military Action from Pizza delivery to pentagon
0 Need to protect the associations and correlations between the data that are sensitive or private
3204/21/23 01:27
Privacy Preserving Data Mining
User Interface Manager
ConstraintManager
Privacy Constraints
Query Processor:
Constraints during query and release operations
Data Miner:
Makes correlations
Ensures privacy
Database Design Tool
Structures the database
DatabaseDBMS
3304/21/23 01:27
Current Status, Challenges and Directions0 Status
- Data Mining is now a technology- Several prototypes and tools exist; Many or almost all of
them work on relational databases0 Challenges
- Mining large quantities of data; Dealing with noise and uncertainty, reasoning with incomplete data, Eliminating False positives and False negatives
0 Directions- Mining multimedia and text databases, Web mining
(structure, usage and content), Mining metadata, Real-time data mining, Privacy
3404/21/23 01:27
Semantic Web: Overview
0 According to Tim Berners Lee, The Semantic Web supports- Machine readable and understandable web pages- Enterprise application integration- Nodes and links that essentially form a very large
database
Premise:
Semantic Web Applications: Web Database Management +
Web Services + Information Integration + - - - - -
Semantic Web Technologies: XML, RDF, Ontologies, Rules-ML
3504/21/23 01:27
Layered Architecture for Dependable Semantic Web
0 Some Challenges: Interoperability between Layers; Security and Privacy cut across all layers; Integration of Services; Composability
XML, XML Schemas
Rules/Query
Logic, Proof and TrustTRUST
OtherServicesRDF, Ontologies
URI, UNICODE
PRIVACY
0Adapted from Tim Berners Lee’s description of the Semantic Web
3604/21/23 01:27
What is XML all about?
0 XML is needed due to the limitations of HTML and complexities of SGML
0 It is an extensible markup language specified by the W3C (World Wide Web Consortium)
0 Designed to make the interchange of structured documents over the web easier
0 Key to XML are Document Type Definitions (DTDs) and XML Schemas
0 Allows users to bring multiple files together to form compound documents
3704/21/23 01:27
What is Knowledge Management?
0 Knowledge management, or KM, is the process through which organizations generate value from their intellectual property and knowledge-based assets
0 Gartner group: KM is a discipline that promotes an integrated approach to identifying and sharing all of an enterprise's information assets, including databases, documents, policies and procedures as well as unarticulated expertise and experience resident in individual workers
0 Peter Senge: Knowledge is the capacity for effective action, this distinguishes knowledge from data and information; KM is just another term in the ongoing continuum of business management evolution
3804/21/23 01:27
Knowledge Management Components
Components:StrategiesProcessesMetrics
Cycle:Knowledge, CreationSharing, Measurement And Improvement
Technologies:Expert systemsCollaborationTrainingWeb
Components ofKnowledge Management: Components,Cycle and Technologies
3904/21/23 01:27
KM: Strategy, Process and Metrics
0 Strategy- Motivation for KM and how to structure a KM program
0 Process- Use of KM to make existing practice more effective
0 Metrics- Measure the impact of KM on an organization
4004/21/23 01:27
Strategy: Building Learning Organizations
0 Adaptive learning and Generative learning- Need to adapt to the changing environment- Total quality movement (TQM) in Japan has migrated to a
generative learning model=Look at the world in a new way
0 Changing roles of the leader- Migrating from decision makers to designers, teachers
and stewards 0 Building a shared vision
- Encouraging ideas, Requesting support, Moving beyond blame, Effective communication
0 Learning tools- Learning laboratory
4104/21/23 01:27
Knowledge Management in Process Management
0 Types of Processes- Simple processes: Low level operation- Complex and nonadapative processes: Systems that use
the same rules- Complex and adaptive: Agents carrying out the processes
are intelligent and adaptive0 Linking knowledge management with processes
- Knowledge management is needed for all processes; critical for complex and adaptive processes
- Learn from experience and use the experience in unknown situations
4204/21/23 01:27
Metrics: The Balanced Scorecard
0 Employee Capabilities: Measuring the following- Employee satisfaction- Employee retention- Employee productivity
0 Information system capabilities: Measuring the following- Whether each employee segment has information to carry
out its operations. 0 Motivation and Empowerment: Measuring the following
- Suggestions made and implemented- Improvement- Team performance
4304/21/23 01:27
Knowledge Management Architecture
Knowledge Creation and Acquisition Manager
Knowledge RepresentationManager
Knowledge ManipulationManager
Knowledge Dissemination and SharingManager
4404/21/23 01:27
Secure Knowledge Management
0 Protecting the intellectual property of an organization
0 Access control including role-based access control
0 Security for process/activity management and workflow
- Users must have certain credentials to carry out an activity
0 Composing multiple security policies across organizations
0 Security for knowledge management strategies and processes
0 Risk management and economic tradeoffs
0 Digital rights management and trust negotiation
4504/21/23 01:27
Status and Directions
0 Knowledge management has exploded due to the web
0 Knowledge Management has different dimensions
- Technology, Business
- Goal is to take advantage of knowledge in a corporation for reuse
0 Tools are emerging
0 Need effective partnerships between business leaders, technologists and policy makers
0 Knowledge management may subsume information management and data management
- Vague boundaries
4604/21/23 01:27
Other Ideas and Directions?Prof. Bhavani Thuraisingham
- Director Cyber Security Center
- Department of Computer Science
- Erik Jonsson School of Engineering and Computer Science
- The University of Texas at Dallas
- Richardson, Texas
http://www.utdallas.edu/~bxt043000/
President
Dr-Bhavani Security Consulting
Dallas, TX
www.dr-bhavani.org