EIM Intro - Information Architectures

Embed Size (px)

Citation preview

  • 8/9/2019 EIM Intro - Information Architectures

    1/36

    RFCorsello

    Research

    Foundation

    Enterprise Information ManagemInformation Architectures

  • 8/9/2019 EIM Intro - Information Architectures

    2/36

    Introduction

    Information architecture is the set of practices and processes us Define the appropriate data models

    Define the mechanisms for persisting and representing data

    Structure data for the most effective use and reuse throughout theinformation lifecycle

    Information architecture spans technology and the physical wo Not all things are in computers

    Not all things outside of the computer can be omitted

    Mechanisms and methods for resolving paper and computers

  • 8/9/2019 EIM Intro - Information Architectures

    3/36

    ConcQuick Voc

  • 8/9/2019 EIM Intro - Information Architectures

    4/36

    Terms

    Enterprise - the collection of organizations within a given business domain that operate

    Corresponds to business specific groups within multiple organizations that share common inform

    It is about how information is structured and managed to facilitate the sharing of data between oand divisions within an organization

    Repository - data is stored in repositories

    May be physically implemented as a database (such as Oracle or Sql Server)

    May be any persistent mechanism (such as files)

    Modern relational database management systems (RDBMSes) are the most commonly used pers

    mechanism

    Application a piece of software that runs on a computer

    May or may not have a user interface (UI)

    An RDBMS is an application that does not have a user interface, however there are applications (Toad) that provide a user interface to an RDBMS.

  • 8/9/2019 EIM Intro - Information Architectures

    5/36

    Information Perspectives

    An enterprise information architecture will consist of multiple

    repositories

    Each contains a subset of the data for that enterprise

    Data must be structured based upon some consistent means to facdiscovery and use

    Two primary considerations for evaluating the best strategies fo

    enterprise data:

    Management strategy

    Implementation strategy

  • 8/9/2019 EIM Intro - Information Architectures

    6/36

    Information Models

    A model is a logical representation of a phenomenon in the rea

    A data model is a model for a data entity that is the representat

    real world phenomenon. A model comes in two parts:

    Conceptual model

    Instance model (realization)

  • 8/9/2019 EIM Intro - Information Architectures

    7/36

    Management Strategies

    Data management is based upon some strategy for organizing the dbeing managed

    The selection of an appropriate strategy is not trivial and is based uuse of the data once persisted

    If there are many uses for a data element that spans business practicesbe most efficient for human productivity to adopt multiple strategies

    There are three primary strategies:

    Project oriented

    Topic oriented

    Entity oriented

  • 8/9/2019 EIM Intro - Information Architectures

    8/36

    Project Oriented

    All data is grouped, collected and managed by the project(s) it i

    associated with

    Each organizational project (such as a specific otolith study) gets a

    repository or partition (directory in a file system sense)

    Data generated and used within that project is stored within the p

    repository based upon data models used for that project

    Projects may adopt centralized models, or define project specific m

    for any given data domain

  • 8/9/2019 EIM Intro - Information Architectures

    9/36

    Project Oriented Scorecard

    Benefits

    Good for data collection and general field activityprojects

    May have minimal structure and little need for datamanipulation

    People are responsible for maintaining the flow ofinformation within their project and must push dataout to make it available to others

    People can be highly efficient as they have minimalconstraints to slow down human efforts

    Only data that is used by a project needs to betransformed for use

    Costs

    Data can become unusable or undiscovprojects

    There is no central source for a given ty

    Once data is acquired from another proprocessing may be required to make it

    Data may be in multiple formats whichunusable over time

    Data sets will have to be transformed tfor every project reusing the data

    Modeling must be performed for each any gaps from existing models

    Overall Score

    Poor

    Limited use in very specialize domains

    Good for client deliverable (turn-key) w

  • 8/9/2019 EIM Intro - Information Architectures

    10/36

    Topic Oriented

    Divides repositories by business topic (domain) area

    Each topic has a unified model that is used within that topic are

    If multiple topics use a common data entity (real world thing), ea

    area may have a distinct model for that entity

    The models may be entirely different and incompatible

  • 8/9/2019 EIM Intro - Information Architectures

    11/36

    Topic Oriented Scorecard

    Benefits

    Good for businesses with very few domains that do notinteract with external organizations

    All projects can interchange data freely as they are basedupon common models with shared repositories by topic

    All data for an enterprise within a topic is in a usable format

    Each topic may create models that accepted tools can usedirectly

    Integration data sets may be created to ensure compatibilitybetween domains by sharing models at overlap points

    Costs

    Translation of data across topics may be cos

    Integration of data across topics may be impcorrespondence between the domains is nobeforehand

    Each topic may be incompatible for anothetools

    Data integration costs may be high, and ten

    Modeling must focus on an entire business on commonly used aspects of the topic

    Overall Score

    Moderately poor

    Allows for solid models of specific topics ac

    Commonality between domains are commo

    Good for highly specialized organizations an

  • 8/9/2019 EIM Intro - Information Architectures

    12/36

    Entity Oriented

    Most beneficial and most complex form of management

    All real-world entities used within the enterprise must be ident

    and modeled separately

    Each entity becomes an atomic data repository that can be shared

    The primary goal is to adequately identify the entities and model t

    the greatest level of detail required

    Determining which perspectives of the real-world entities should b

    modeled is complex

  • 8/9/2019 EIM Intro - Information Architectures

    13/36

    Entity Oriented Scorecard

    Benefits Data sharing is free due to the unified

    entity models throughout the enterprise

    Across organizational boundaries onlyagreement that is necessary is for the actualentities those organizations share

    Applications may be built to effectivelyconsume and reuse data across projects andtopics based upon the entity model

    Interactions between entity models may bemodeled separately

    Modeling may be performed in phasesbased upon use

    Software tools can be reused as componentsand integrated in different ways

    Costs

    Modeling takes more time and e

    Software tools will nearly alway

    Overall Score

    Good

    Overall, cost and time are the odrawbacks

    If entity model becomes standavendors may likewise build toolsmodels

  • 8/9/2019 EIM Intro - Information Architectures

    14/36

    Hybrid Approaches

    In many cases, it is advantageous to combine: Project-oriented strategy for the data capture portion of the lifecycle

    Entity-based strategy for long-term persistence in enterprise repositori

    This enables the implementation of the project strategy to facilitatecollection efforts while the entity modeling and implementation effunderway for the enterprise repositories

    This hybrid strategy yields long-term flexibility for field staff performdata collection as well

    Still has longer timelines and greater short-term cost

    Cost savings in the long-term is based upon many factors such as data s

  • 8/9/2019 EIM Intro - Information Architectures

    15/36

    Implementation Strate

  • 8/9/2019 EIM Intro - Information Architectures

    16/36

    Software Implementations

    There are several forms of software application architectures in Thick Client

    Client-Server

    Three-Tier

    N-Tier

    Cloud

    Each implementation strategy provides certain pros and cons

    N-Tier and cloud seem to hold the greatest promise moving forwar

  • 8/9/2019 EIM Intro - Information Architectures

    17/36

    Thick Client

    The most straightforward implementation strategy for a softwa

    solution is the client application or thick client

    Data and processing are local to the software and all operations

    on the user computer

    Common example of this architecture is the traditional word-

    processing application such as Microsoft Word or Corel WordPe

  • 8/9/2019 EIM Intro - Information Architectures

    18/36

    Client-Server

    Moving to multi-user concurrent usage capabilities starts with the addition o

    server-based component to the software solution

    The most basic form is the client-server strategy

    There are exactly two deployment components to the overall application, onon the user computer (the client) and the other runs on a remote server

    A common example of this is the database enabled application

    The most prevalent form of the client-server architecture is a basic web site

    The client-server architecture is simple, efficient and

    Client-server architectures do not scale well for intense processing or large ubases

  • 8/9/2019 EIM Intro - Information Architectures

    19/36

    Three-Tier

    Three-tier architecture consists of:

    Client application

    Business processing server

    Data storage tier

    The three-tier architecture is a common implementation strateg

    basic business web applications

  • 8/9/2019 EIM Intro - Information Architectures

    20/36

    N-Tier

    An evolution of the three-tierarchitecture is the N-tier,

    where any N, or number of

    tiers, exist to support the

    application

    Modern web based

    applications frequently use an

    N-tier approach, especially

    where service oriented

    architectures (SOA) are applied

  • 8/9/2019 EIM Intro - Information Architectures

    21/36

    Cloud and Distributed

    Any form of application that is run in part on multiple machines isdistributed

    Formally, a distributed system partitions the work across multiple machjust separating user interface from logic

    N-Tier applications are distributed applications

    Cloud computing is:

    When a distributed application is based upon placing portions of thecomputation in separate locations requiring the Internet for communic

    Often, this separates storage from computing over the Internet

    E.g. using Flikr to store images, Amazon S3 for disk storage and Microsoft Azcomputation all in one web application

  • 8/9/2019 EIM Intro - Information Architectures

    22/36

    Capability Partitio

  • 8/9/2019 EIM Intro - Information Architectures

    23/36

    Partitioning

    Trade-offs are made to ensure:

    Performance

    Scalability

    Maintainability

    To provide any capability, there is a minimum cost and timeline

    An effective solution will always be in excess of these minimums

    Strategies for reuse, integration and partitioning are effective at minimizing realized costs by distributing the costs

    Partitioning allows for resource sharing in any of several areas:

    Conceptual reuse ideas, designs and algorithms are applied to multiple projects

    Source reuse software source code is reused on multiple projects

    Library reusecompiled libraries of code are reused as-is on multiple projects

    Hardware reuse multiple applications are hosted on a single physical server

    Service reuse software service(s) are reused by multiple applications (such as SOA)

    Data reuse a single authoritative data repository is reused by multiple applications

  • 8/9/2019 EIM Intro - Information Architectures

    24/36

    Partitioning Targets

    Trade-offs to provide capabilities at reduced costs generally involve

    partitioning strategies Each of the primary computing areas for a software application ma

    subject to partitioning

    These primary computing areas are:

    Repositories of data the full corpus of data may be partitioned into doentity specific repository models

    Processing engines or capabilities computational portions may be sepinto reusable analytical components for reuse

    User presentation (GUIs) may be partitioned away from an application there is no business logic associated with the display of information

  • 8/9/2019 EIM Intro - Information Architectures

    25/36

    Repositories

    Software is designed to process data in some form

    The repository is a conceptual store from which software will access and pro

    There are several strategies for partitioning repositories across an enterprise

    Enterprise centric the entire enterprise centralizes all data into a single marepository

    Application centric each application has a dedicated repository

    Domain centric each business domain has a dedicated repository that allapplications using that domain data must connect to

    Entity centric each entity is modeled and a repository exists for that entity

    All applications using an entity are connected to that entity repository

  • 8/9/2019 EIM Intro - Information Architectures

    26/36

    RepositoStore t

  • 8/9/2019 EIM Intro - Information Architectures

    27/36

    Enterprise Centric

    The approach of centralizing all data into a master integrated

    repository is only effective for small repositories with limited gr

    The definition of small in this context is fluid as a function of c

    providing a hardware infrastructure to support such a repositor

    In general, this is not a recommended approach in most circumsta

  • 8/9/2019 EIM Intro - Information Architectures

    28/36

    Application Centric

    Each application gets a dedicated repository

    If multiple applications require access to thesame data, that data is maintained in bothrepositories

    This solution provides the greatest level ofperformance for a single application, but comesat the additional cost of data duplication andissues of consistency for rapidly changing data

    For small organizations with static data sets, theapplication centric approach may be quite effective

    In many cases, the application centric approachwill be sub-optimal in all areas due to the effortinvolved in establishing the data synchronizationmechanisms and the cost of data duplication

    This is the most natural form of partitiothe isolation of repositories for each ap

    In this form, developers of a solutiofocus on the local problem alone

    May lead to lower development coapplication at the cost of poorer fitcapability required

  • 8/9/2019 EIM Intro - Information Architectures

    29/36

    Domain Centric

    Domain centric partitioning ensures each domain within the enterprise has a

    master repository Tends to have similar pros and cons to the application centric model

    Application developers are required to have a reasonable understanding of business domains affected by the application

    Each application may be required to communicate with more than one repo

    Data integration across repositories may result in the need for additional joelements to be added to repositories

    The key limitation is the potential for data duplication and the reconciliationsameness of entities between domains

    If domains each model person for their repository, the person models maydifferent and thusly incompatible

  • 8/9/2019 EIM Intro - Information Architectures

    30/36

    Entity Centric

    Entity centric repositories are the most data efficient

    Also most design costly

    Each data entity has an isolated repository with identified linkages between repo

    Example:

    A single people repository of all human beings known throughout the enterprise

    Contains people information for employees, customers, suppliers, contractors, etc

    People repository only contains the information that describes the people notion of thindividuals

    The modeling of entities drives the repository boundaries, and the integration of across repositories happens within one of the repositories participating in the int

  • 8/9/2019 EIM Intro - Information Architectures

    31/36

    Processing and DisComputational Logic And User In

  • 8/9/2019 EIM Intro - Information Architectures

    32/36

    Processing Engines

    A large part of computationally intensive applications involves generic proce

    functions

    Separating processing capabilities into reusable structures can yield great coin multiple areas including long-term supportability

    The primary cost associated with these reusable structures is designing themfor reuse

    The isolation of these computational units can take on different forms:

    Reusable libraries such as a statistical analysis library

    Reusable frameworks provide a collection of common capabilities that may across applications

    Reusable services such as data processing web services are increasing in avaand form the basis of most SOA implementations

  • 8/9/2019 EIM Intro - Information Architectures

    33/36

    Hybrid Processing

    In developing software for an enterprise, all three partitioning

    strategies may be used together for maximal effectiveness and

    operational longevity

    Component based development extends from software into dat

    partitioning data effects software and vice versa

  • 8/9/2019 EIM Intro - Information Architectures

    34/36

    User Presentations

    The user interface of an application is responsible for the presentation of data and cont

    application user Often called the presentation layer and ideally contains no functional logic for the application

    When properly designed and partitioned, the GUI is completely independent of the functional poapplication

    The GUI itself can be partitioned into components that can be reused across applications

    There are two different aspects of the GUI that can be partitioned for an application:

    Partitioning of the GUI from the capability logic

    Partitioning of the GUI itself into GUI components

    The most significant area of reuse comes from the first area of separating the GUI from business logic

    This should be the default development pattern for application development

    This is often not followed in practice to reduce development and planning time

  • 8/9/2019 EIM Intro - Information Architectures

    35/36

    Conclusions

    Architecting information solutions for an organization is a complex set of practice

    trade-offs to maximize capabilities while minimizing cost Given that information solutions take a great deal of time and care to construct, proper p

    required well in advance of need to ensure solutions are available by the time the need awithout wasted efforts

    Various strategies exist for planning information repositories, software implemenand user facing applications

    Planning for reuse of repositories and software back-end components and services is of gimportance

    Stakeholders involved with information strategies need to understand the difference betrepositories containing data, back-end software and the user interfaces that present data

    The separation of these concepts in the minds of those involved in planning can yield grelong-term cost savings and capabilities realized

  • 8/9/2019 EIM Intro - Information Architectures

    36/36

    Quest