Upload
michael-corsello
View
228
Download
0
Embed Size (px)
Citation preview
8/9/2019 EIM Intro - Information Architectures
1/36
RFCorsello
Research
Foundation
Enterprise Information ManagemInformation Architectures
8/9/2019 EIM Intro - Information Architectures
2/36
Introduction
Information architecture is the set of practices and processes us Define the appropriate data models
Define the mechanisms for persisting and representing data
Structure data for the most effective use and reuse throughout theinformation lifecycle
Information architecture spans technology and the physical wo Not all things are in computers
Not all things outside of the computer can be omitted
Mechanisms and methods for resolving paper and computers
8/9/2019 EIM Intro - Information Architectures
3/36
ConcQuick Voc
8/9/2019 EIM Intro - Information Architectures
4/36
Terms
Enterprise - the collection of organizations within a given business domain that operate
Corresponds to business specific groups within multiple organizations that share common inform
It is about how information is structured and managed to facilitate the sharing of data between oand divisions within an organization
Repository - data is stored in repositories
May be physically implemented as a database (such as Oracle or Sql Server)
May be any persistent mechanism (such as files)
Modern relational database management systems (RDBMSes) are the most commonly used pers
mechanism
Application a piece of software that runs on a computer
May or may not have a user interface (UI)
An RDBMS is an application that does not have a user interface, however there are applications (Toad) that provide a user interface to an RDBMS.
8/9/2019 EIM Intro - Information Architectures
5/36
Information Perspectives
An enterprise information architecture will consist of multiple
repositories
Each contains a subset of the data for that enterprise
Data must be structured based upon some consistent means to facdiscovery and use
Two primary considerations for evaluating the best strategies fo
enterprise data:
Management strategy
Implementation strategy
8/9/2019 EIM Intro - Information Architectures
6/36
Information Models
A model is a logical representation of a phenomenon in the rea
A data model is a model for a data entity that is the representat
real world phenomenon. A model comes in two parts:
Conceptual model
Instance model (realization)
8/9/2019 EIM Intro - Information Architectures
7/36
Management Strategies
Data management is based upon some strategy for organizing the dbeing managed
The selection of an appropriate strategy is not trivial and is based uuse of the data once persisted
If there are many uses for a data element that spans business practicesbe most efficient for human productivity to adopt multiple strategies
There are three primary strategies:
Project oriented
Topic oriented
Entity oriented
8/9/2019 EIM Intro - Information Architectures
8/36
Project Oriented
All data is grouped, collected and managed by the project(s) it i
associated with
Each organizational project (such as a specific otolith study) gets a
repository or partition (directory in a file system sense)
Data generated and used within that project is stored within the p
repository based upon data models used for that project
Projects may adopt centralized models, or define project specific m
for any given data domain
8/9/2019 EIM Intro - Information Architectures
9/36
Project Oriented Scorecard
Benefits
Good for data collection and general field activityprojects
May have minimal structure and little need for datamanipulation
People are responsible for maintaining the flow ofinformation within their project and must push dataout to make it available to others
People can be highly efficient as they have minimalconstraints to slow down human efforts
Only data that is used by a project needs to betransformed for use
Costs
Data can become unusable or undiscovprojects
There is no central source for a given ty
Once data is acquired from another proprocessing may be required to make it
Data may be in multiple formats whichunusable over time
Data sets will have to be transformed tfor every project reusing the data
Modeling must be performed for each any gaps from existing models
Overall Score
Poor
Limited use in very specialize domains
Good for client deliverable (turn-key) w
8/9/2019 EIM Intro - Information Architectures
10/36
Topic Oriented
Divides repositories by business topic (domain) area
Each topic has a unified model that is used within that topic are
If multiple topics use a common data entity (real world thing), ea
area may have a distinct model for that entity
The models may be entirely different and incompatible
8/9/2019 EIM Intro - Information Architectures
11/36
Topic Oriented Scorecard
Benefits
Good for businesses with very few domains that do notinteract with external organizations
All projects can interchange data freely as they are basedupon common models with shared repositories by topic
All data for an enterprise within a topic is in a usable format
Each topic may create models that accepted tools can usedirectly
Integration data sets may be created to ensure compatibilitybetween domains by sharing models at overlap points
Costs
Translation of data across topics may be cos
Integration of data across topics may be impcorrespondence between the domains is nobeforehand
Each topic may be incompatible for anothetools
Data integration costs may be high, and ten
Modeling must focus on an entire business on commonly used aspects of the topic
Overall Score
Moderately poor
Allows for solid models of specific topics ac
Commonality between domains are commo
Good for highly specialized organizations an
8/9/2019 EIM Intro - Information Architectures
12/36
Entity Oriented
Most beneficial and most complex form of management
All real-world entities used within the enterprise must be ident
and modeled separately
Each entity becomes an atomic data repository that can be shared
The primary goal is to adequately identify the entities and model t
the greatest level of detail required
Determining which perspectives of the real-world entities should b
modeled is complex
8/9/2019 EIM Intro - Information Architectures
13/36
Entity Oriented Scorecard
Benefits Data sharing is free due to the unified
entity models throughout the enterprise
Across organizational boundaries onlyagreement that is necessary is for the actualentities those organizations share
Applications may be built to effectivelyconsume and reuse data across projects andtopics based upon the entity model
Interactions between entity models may bemodeled separately
Modeling may be performed in phasesbased upon use
Software tools can be reused as componentsand integrated in different ways
Costs
Modeling takes more time and e
Software tools will nearly alway
Overall Score
Good
Overall, cost and time are the odrawbacks
If entity model becomes standavendors may likewise build toolsmodels
8/9/2019 EIM Intro - Information Architectures
14/36
Hybrid Approaches
In many cases, it is advantageous to combine: Project-oriented strategy for the data capture portion of the lifecycle
Entity-based strategy for long-term persistence in enterprise repositori
This enables the implementation of the project strategy to facilitatecollection efforts while the entity modeling and implementation effunderway for the enterprise repositories
This hybrid strategy yields long-term flexibility for field staff performdata collection as well
Still has longer timelines and greater short-term cost
Cost savings in the long-term is based upon many factors such as data s
8/9/2019 EIM Intro - Information Architectures
15/36
Implementation Strate
8/9/2019 EIM Intro - Information Architectures
16/36
Software Implementations
There are several forms of software application architectures in Thick Client
Client-Server
Three-Tier
N-Tier
Cloud
Each implementation strategy provides certain pros and cons
N-Tier and cloud seem to hold the greatest promise moving forwar
8/9/2019 EIM Intro - Information Architectures
17/36
Thick Client
The most straightforward implementation strategy for a softwa
solution is the client application or thick client
Data and processing are local to the software and all operations
on the user computer
Common example of this architecture is the traditional word-
processing application such as Microsoft Word or Corel WordPe
8/9/2019 EIM Intro - Information Architectures
18/36
Client-Server
Moving to multi-user concurrent usage capabilities starts with the addition o
server-based component to the software solution
The most basic form is the client-server strategy
There are exactly two deployment components to the overall application, onon the user computer (the client) and the other runs on a remote server
A common example of this is the database enabled application
The most prevalent form of the client-server architecture is a basic web site
The client-server architecture is simple, efficient and
Client-server architectures do not scale well for intense processing or large ubases
8/9/2019 EIM Intro - Information Architectures
19/36
Three-Tier
Three-tier architecture consists of:
Client application
Business processing server
Data storage tier
The three-tier architecture is a common implementation strateg
basic business web applications
8/9/2019 EIM Intro - Information Architectures
20/36
N-Tier
An evolution of the three-tierarchitecture is the N-tier,
where any N, or number of
tiers, exist to support the
application
Modern web based
applications frequently use an
N-tier approach, especially
where service oriented
architectures (SOA) are applied
8/9/2019 EIM Intro - Information Architectures
21/36
Cloud and Distributed
Any form of application that is run in part on multiple machines isdistributed
Formally, a distributed system partitions the work across multiple machjust separating user interface from logic
N-Tier applications are distributed applications
Cloud computing is:
When a distributed application is based upon placing portions of thecomputation in separate locations requiring the Internet for communic
Often, this separates storage from computing over the Internet
E.g. using Flikr to store images, Amazon S3 for disk storage and Microsoft Azcomputation all in one web application
8/9/2019 EIM Intro - Information Architectures
22/36
Capability Partitio
8/9/2019 EIM Intro - Information Architectures
23/36
Partitioning
Trade-offs are made to ensure:
Performance
Scalability
Maintainability
To provide any capability, there is a minimum cost and timeline
An effective solution will always be in excess of these minimums
Strategies for reuse, integration and partitioning are effective at minimizing realized costs by distributing the costs
Partitioning allows for resource sharing in any of several areas:
Conceptual reuse ideas, designs and algorithms are applied to multiple projects
Source reuse software source code is reused on multiple projects
Library reusecompiled libraries of code are reused as-is on multiple projects
Hardware reuse multiple applications are hosted on a single physical server
Service reuse software service(s) are reused by multiple applications (such as SOA)
Data reuse a single authoritative data repository is reused by multiple applications
8/9/2019 EIM Intro - Information Architectures
24/36
Partitioning Targets
Trade-offs to provide capabilities at reduced costs generally involve
partitioning strategies Each of the primary computing areas for a software application ma
subject to partitioning
These primary computing areas are:
Repositories of data the full corpus of data may be partitioned into doentity specific repository models
Processing engines or capabilities computational portions may be sepinto reusable analytical components for reuse
User presentation (GUIs) may be partitioned away from an application there is no business logic associated with the display of information
8/9/2019 EIM Intro - Information Architectures
25/36
Repositories
Software is designed to process data in some form
The repository is a conceptual store from which software will access and pro
There are several strategies for partitioning repositories across an enterprise
Enterprise centric the entire enterprise centralizes all data into a single marepository
Application centric each application has a dedicated repository
Domain centric each business domain has a dedicated repository that allapplications using that domain data must connect to
Entity centric each entity is modeled and a repository exists for that entity
All applications using an entity are connected to that entity repository
8/9/2019 EIM Intro - Information Architectures
26/36
RepositoStore t
8/9/2019 EIM Intro - Information Architectures
27/36
Enterprise Centric
The approach of centralizing all data into a master integrated
repository is only effective for small repositories with limited gr
The definition of small in this context is fluid as a function of c
providing a hardware infrastructure to support such a repositor
In general, this is not a recommended approach in most circumsta
8/9/2019 EIM Intro - Information Architectures
28/36
Application Centric
Each application gets a dedicated repository
If multiple applications require access to thesame data, that data is maintained in bothrepositories
This solution provides the greatest level ofperformance for a single application, but comesat the additional cost of data duplication andissues of consistency for rapidly changing data
For small organizations with static data sets, theapplication centric approach may be quite effective
In many cases, the application centric approachwill be sub-optimal in all areas due to the effortinvolved in establishing the data synchronizationmechanisms and the cost of data duplication
This is the most natural form of partitiothe isolation of repositories for each ap
In this form, developers of a solutiofocus on the local problem alone
May lead to lower development coapplication at the cost of poorer fitcapability required
8/9/2019 EIM Intro - Information Architectures
29/36
Domain Centric
Domain centric partitioning ensures each domain within the enterprise has a
master repository Tends to have similar pros and cons to the application centric model
Application developers are required to have a reasonable understanding of business domains affected by the application
Each application may be required to communicate with more than one repo
Data integration across repositories may result in the need for additional joelements to be added to repositories
The key limitation is the potential for data duplication and the reconciliationsameness of entities between domains
If domains each model person for their repository, the person models maydifferent and thusly incompatible
8/9/2019 EIM Intro - Information Architectures
30/36
Entity Centric
Entity centric repositories are the most data efficient
Also most design costly
Each data entity has an isolated repository with identified linkages between repo
Example:
A single people repository of all human beings known throughout the enterprise
Contains people information for employees, customers, suppliers, contractors, etc
People repository only contains the information that describes the people notion of thindividuals
The modeling of entities drives the repository boundaries, and the integration of across repositories happens within one of the repositories participating in the int
8/9/2019 EIM Intro - Information Architectures
31/36
Processing and DisComputational Logic And User In
8/9/2019 EIM Intro - Information Architectures
32/36
Processing Engines
A large part of computationally intensive applications involves generic proce
functions
Separating processing capabilities into reusable structures can yield great coin multiple areas including long-term supportability
The primary cost associated with these reusable structures is designing themfor reuse
The isolation of these computational units can take on different forms:
Reusable libraries such as a statistical analysis library
Reusable frameworks provide a collection of common capabilities that may across applications
Reusable services such as data processing web services are increasing in avaand form the basis of most SOA implementations
8/9/2019 EIM Intro - Information Architectures
33/36
Hybrid Processing
In developing software for an enterprise, all three partitioning
strategies may be used together for maximal effectiveness and
operational longevity
Component based development extends from software into dat
partitioning data effects software and vice versa
8/9/2019 EIM Intro - Information Architectures
34/36
User Presentations
The user interface of an application is responsible for the presentation of data and cont
application user Often called the presentation layer and ideally contains no functional logic for the application
When properly designed and partitioned, the GUI is completely independent of the functional poapplication
The GUI itself can be partitioned into components that can be reused across applications
There are two different aspects of the GUI that can be partitioned for an application:
Partitioning of the GUI from the capability logic
Partitioning of the GUI itself into GUI components
The most significant area of reuse comes from the first area of separating the GUI from business logic
This should be the default development pattern for application development
This is often not followed in practice to reduce development and planning time
8/9/2019 EIM Intro - Information Architectures
35/36
Conclusions
Architecting information solutions for an organization is a complex set of practice
trade-offs to maximize capabilities while minimizing cost Given that information solutions take a great deal of time and care to construct, proper p
required well in advance of need to ensure solutions are available by the time the need awithout wasted efforts
Various strategies exist for planning information repositories, software implemenand user facing applications
Planning for reuse of repositories and software back-end components and services is of gimportance
Stakeholders involved with information strategies need to understand the difference betrepositories containing data, back-end software and the user interfaces that present data
The separation of these concepts in the minds of those involved in planning can yield grelong-term cost savings and capabilities realized
8/9/2019 EIM Intro - Information Architectures
36/36
Quest