Upload
jessie-powers
View
222
Download
0
Embed Size (px)
Citation preview
© 2009 IBM Corporation Confidential
®
1
IBM P8 CE 4.x Storage Overview
Bob Kreuch, CEMP/CFS Development TeamFebruary 3, 2009
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
2
Outline
1. Document & Content Object Overview
2. Storage Areas
Database, File System, Fixed
3. Storage Policies
4. Storage Area Resource Status
5. Fixed Storage Devices
Image Services, Centera, SnapLock, IICE
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
3
Section 1 – Document/Content Model Overview
ContentTransfer objects
contain content that is
managed by CEMP
Documentobject ContentElements
(set of obejcts)
ContentTransferobject
ContentReferenceobject
ContentReference objects contain
a reference to content (for example,
an HTTP address) that is not
managed by CEMP
Each Document (or Annotation)
contains a set of zero or more
ContentElement objects
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
4
Document Object
Properties
Content
Object Store Database Tables
Storage Area
Document [or Annotation] object
contains properties and content
Document
Property values
are stored in
the Object Store
database tables
Content is stored
in a Storage Area
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
5
Content Elements
Properties
Content
Object Store Database Tables
Storage Area
All content for a given
Document is stored in a
single Storage Area
Document
Content Element 0
Content Element 1
Content Element 2
Most applications (like
Workplace) only support a
single [ContentTransfer]
content element
The content is exposed
to applications as a set of
(zero or more) Content
Element objects
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
6
Document Version Series
Properties
Content
Document
A reservation object is a
new document object,
created from the previous
version of the document
Version 1
Properties
Content
Reservation
Reserved
Checkout
Properties
Content
Document
Version 2
Checkin
When a reservation is
checked in, it becomes the
next version of the
document (the reservation
is transformed, a new
object is not created)
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
7
Immutable Content Element
A Content Element is immutable in that once the content is committed to CEMP, the content cannot be altered (it can only be deleted). This applies to the content of Documents in the checked-in state and in the reservation state, and to the content of Annotations.
Content is always stored precisely as it is received from the client (byte for byte), CEMP does not alter the content, add formatting or header data, etc.
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
8
Immutable Content Element Set
The Content Element set for a given Document in the checked-in state is immutable, no new elements can be added to the set, and no existing element can be removed from the set.
The Content Element set for a Document in the reservation state is mutable. Existing elements can be removed from the set, and new elements can be appended to the set (but a new element cannot replace an existing element – a new element is appended to the end of the set using a new sequence number). Annotations follow the same rules as a Document in the reservation state, since they do not support versioning.
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
9
Section 2 – Storage Areas
CEMP Supports Three Types of Content Storage Areas
• Database [DatabaseStorageArea]
• File System [FileStorageArea]
• Fixed (External) [FixedStorageArea]
• Native Content
• Federated Content
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
10
Database Storage Area
Each Object Store has a built in Database Storage Area, which is automatically created when the Object Store is created.
PROS:
Easy to use, no configuration is necessary, security issues are handled by security on the Object Store database. Backed-up as part of the Object Store Database backup. Excellent performance for small to medium sized content.
CONS:
May not be a suitable storage mechanism for large volumes of immutable content (or very large content elements).
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
11
Database Storage Area – Content Element Database Row
CONTENT table ELEMENT_ID EXTENSION CONTENT CONTENT_SIZE CREATE_DATE
Content Element RowObject Store Database
Content data bytes
(blob column)
{bc2d1fc8-d78e-4e48-bc7b-7ef96d1e97b8}0
Optional file extension of the
source file (for example, ‘pdf’ for
a file named ‘mydocument.pdf’)
Unique ID of the element,
Document object_id + element
sequence number
Size of the
content data in
number of bytes
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
12
Database Storage Area – Document with Three Elements
CONTENT table
ELEMENT_ID EXTENSION CONTENT CONTENT_SIZE CREATE_DATE
Content Element Row (element 0)
Object Store Database
{bc2d1fc8-d78e-4e48-bc7b-7ef96d1e97b8}0
ELEMENT_ID EXTENSION CONTENT CONTENT_SIZE CREATE_DATE
Content Element Row (element 1)
{bc2d1fc8-d78e-4e48-bc7b-7ef96d1e97b8}1
ELEMENT_ID EXTENSION CONTENT CONTENT_SIZE CREATE_DATE
Content Element Row (element 2)
{bc2d1fc8-d78e-4e48-bc7b-7ef96d1e97b8}2
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
13
File System Storage Area
A file system based storage area uses a standard NTFS or UNIX file system to store content (CIFS and NFS protocols are supported).
PROS:
Supports large content and very large volumes of content - any number of areas may be created for an Object Store, content may be randomly distributed across the areas during creation.
CONS:
Must be managed independently from the database (security configuration, space management, backup and restore, etc.).
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
14
File System Storage Area – Directory Structure
Storage Area Root Folder
Inbound Content
Stakefilefn_stakefile.xml
FN0 FN1 FN22
FN0 FN1 FN22
FN0 FN1 FN22
ContentFile
The root folder is created by a system
administrator
The stakefile provides verification that a storage area is really the correct area
The inbound folder provides temporary storage for content while it is uploaded
to the area
Content that has been ‘finalized’ is stored under the content folder hierarchy
An area has either two (small) or
three (large) levels of subfolders
(used to distribute the content across
the file system namespace)
Content is stored in a file, under the lowest level of the subfolder hierarchy (each content element is stored in a separate file)
Content is not stored in the upper levels of the subfolder hierarchy
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
15
File System Storage Area – Content Element File
FN0(Lowest Level Subfolder)
Content Element File
FN{E15E8CAB-3A03-427E-AAF7-610F3BBA63C5}{35A02CFA-C069-4038-ABB3-A70F981A56EB}-0.html
Filename Prefix
(always ‘FN’)
Object Id of the
Document or
[Annotation] object
Object Id of the
Storage Area
Content Element
sequence number
Filename
extension
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
16
File System Storage Area – Document with Three Elements
FN0(Lowest Level Subfolder)
Content Element File (element 0)
Content Element File (element 1)
Content Element File (element 2)
FN{E15E8CAB-3A03-427E-AAF7-610F3BBA63C5}{35A02CFA-C069-4038-ABB3-A70F981A56EB}-2.html
FN{E15E8CAB-3A03-427E-AAF7-610F3BBA63C5}{35A02CFA-C069-4038-ABB3-A70F981A56EB}-1.html
FN{E15E8CAB-3A03-427E-AAF7-610F3BBA63C5}{35A02CFA-C069-4038-ABB3-A70F981A56EB}-0.html
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
17
File System Storage Area – Remote File System
File System Storage Area
CEMP Server
CEMP Server CEMP Server
The computer that hosts the storage area is remote from the CEMP Server, to avoid data loss (during content upload), it must have a UPS power backup or support some form of file write through
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
18
Unix File System Based Area – NFS Export
Storage Area Root Folder
The root folder is
exported via NFS
CEMP Server Instance 1
CEMP Server Instance 2
The root folder is
mounted using NFS
on each CEMP
server system
CEMP servers
access the file
storage area over
the network
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
19
Unix File System Based Area – Security
CEMP Server Instance
(Deployment)
J2EE Application Server
UNIX CEMP User Account
The CEMP server runs under the context of a UNIX
user account (call it the CEMP user account)
Create Folder Folder ‘Inbound’
The permissions on the folder being created are determined by the creation mask (a.k.a. umask) of the user account that CEMP is running under (the CEMP user account)
CEMP user umask: u=rwx,g=rwx,o=
Resulting Permissions:USER: read/write/execute
GROUP: read/write/executeOTHERS: no access
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
20
NTFS Based Area – CIFS Share
Storage Area Root Folder
The root folder is shared
as a CIFS shared folder
(optionally, a DFS link
may be used to provide
name transparency)
CEMP Server Instance 1
CEMP Server Instance 2
The root folder, and all
subfolders and files, are
accessed via a UNC
path (mapping the
shared folder is optional)
CEMP servers
access the file
storage area over
the network
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
21
NTFS Based Area – Security
CEMP Server Instance
(Deployment)
J2EE Application Server
Windows CEMP User Account
The CEMP server runs
under the context of a
Windows user account (call
it the CEMP user account)
Create Folder Folder ‘Inbound’
The permissions applied to folders and files is based on inheritance from the root folder. Once the root folder is created with the proper permissions and inheritance settings, all other folders and files the correct values.
* The CEMP user account is granted full control, either directly or via group membership.* Other administrative users may be granted full control.* Unwanted inherited/inheritable permissions are removed.
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
22
Fixed Storage Area
A fixed storage area is a CEMP concept used to write content to and retrieve content from an external repository. A fixed storage area consists of four major components, 1) an external repository, 2) a CEMP fixed content device object, 3) a CEMP fixed storage area, and 4) a CEMP fixed content provider implementation.
PROS & CONS:
It’s assumed that a customer has a business case for using an external repository, and that the business case is the overriding factor in the decision to use the external repository (the pros & cons of using the external repository end up being a moot point).
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
23
Fixed Storage Area – Supported External Repositories
• IBM FileNet Image Services
• Network Appliance SnapLock
• EMC Centera
• IBM IICE (Content Services, CM8)
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
24
Fixed Storage Area Components
CEMP Fixed Device
Object
External Repository(a.k.a. Fixed Device)
The CEMP Fixed Device object contains configuration data that
is used by the Fixed Content Provider to connect to and
update the External Repository
Fixed Content Provider
CEMP Server
Fixed Storage Area(a.k.a. Staging Area)
The Fixed Content Provider software uses the client API of the external repository to create, retrieve, and delete external content
An external repository is accessed through the CEMP API as a Fixed Storage Area – the repository cannot be accessed without a corresponding Fixed Storage Area
External Repository Client API
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
25
Fixed Storage Area – Back it Up!
A Fixed Storage has an associated ‘staging area’ that is really a File Storage Area, which needs to be
administered as a storage area (configured as a shared network
resource, data must be backed up, must have a UPS power backup, etc.)
Fixed Storage Area(a.k.a. Staging Area)
Content is first uploaded to the inbound folder of the Fixed Storage Area, then finalized to the content folder – after it’s finalized, it will be written (a.k.a. migrated) to the external repository
The staging area provides storage for content of documents that are queued for migration, that are in the reservation state (including annotations), and for new versions of federated content of non-writable external repositories
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
26
Fixed Storage Area – Referral Blob Creation
External Repository(a.k.a. Fixed Device)
Fixed Content Provider
During migration to the external repository, the Fixed Content
Provider receives a reference to the content (for example, a Image
Services docid) Content Reference
DocVersion row
referral_blob column
Referral Blob
The Fixed Content Provider generates a referral blob from
the content reference data
The referral blob is stored as a column of the document row, in the DOCVERSION table
External Repository Client API
The Fixed Content Provider migrates content to the external
repository (copy & delete source)
1
2
3 4
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
27
Fixed Storage Area – Retrieval using Referral Blob
External Repository(a.k.a. Fixed Device)
Fixed Content Provider
The Fixed Content Provider obtains the referral blob data from the
DocVersion row, then uses the data to retrieve the content,
streaming the data to the client
DocVersion row
referral_blob column
CEMPClient
External Repository Client API
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
28
Fixed Storage Area – Referral Blob Format
Provider Id & Software Version
Checksum
Type Flags
Content Id (GUID of the Document)
Common Referral Header
Image Services DocId
Content Element (IS Page) Count
Content Element Sequence Number
Image Services Page Number
RepositorySpecific
DataContent Element to IS Page Mapping (one entry for each content element)
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
29
Fixed Storage Area – Federated Content
External Repository(a.k.a. Fixed Device)
CEMP Server
The CEMP import agent creates federated content,
using the CEMP API
DocVersion row
referral_blob column
Referral Blob
The content to be federated already exists in the external repository
The referral blob can be used to retrieve, delete, or lock down the external content
CEMP Import Agent
Content
The CEMP server computes the referral blob, based on data received from the application, and inserts the blob into the DocVersion row
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
30
Fixed Storage Area – Creating Federated Content
x-FileNetFCS:?docid=2005083&pagenum=1&mode=3
Fixed header, indicates federated content.
External document identity
Content element identity
CEMP APIContentTransfer.setCaptureSource()
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
31
Section 3 – Storage Policies
Storage Policy
Storage Area A
Storage Area B
Storage Area C
A Storage Policy can point to multiple Storage Areas,
using a selection filter (SQL statement syntax)
An open Storage Area is randomly selected for content storage (the content is only stored in a single area)
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
32
Determining the Storage Area for a Document
DocumentClass
Default Storage Area
Default Storage Policy
DocumentInstance
Storage Area
Storage Policy
A Document Class has a default storage area and a default storage policy (one or the other is required)
CEMP Client API
First Document in the Version Series
The storage area and policy is normally inherited
from the document class, but can be set directly via
the CEMP client API
Setting the area/policy is optional in the API
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
33
Storage Area/Policy Order of Precedence
The Storage Area or Policy applied to the first document of a version series is selected in the following order (the first value that is set will be used):
1) Storage Area set directly on the Document via the API.
2) Storage Policy set directly on the Document via the API.
3) Default Storage Area on the Document class.
4) Default Storage Policy on the Document class.
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
34
Selecting a Storage Area Based on a Policy
DocumentInstance
Storage Area
Storage Policy
First Document in the Version Series
A Storage Area is randomly selected from the Storage Policy, and set as the StorageArea property on the Document
Storage Area Selection
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
35
Determining the Storage Area for a Reservation
DocumentInstance
Storage Area
Storage Policy
DocumentInstance
Storage Area
Storage Policy
A reservation inherits the storage area and policy of the
previous version of the document (the storage area will
always set on the previous version, the policy may or may
not be set)
CEMP Client API
Reservation based on version 1 of the
Document
Given that the area takes precedence over the
policy, the policy is normally ignored when creating a reservation
The area may be set via the API when the reservation is created (to force content for the new version of the document to be stored in a specific storage area).
Version 1 Reservation
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
36
What if the Previous Storage Area is Full/Closed?
DocumentInstance
Storage Area
Storage Policy
DocumentInstance
Storage Area
Storage PolicyCEMP Client API
Reservation based on version 1 of the
Document
If the storage area of the previous document is no longer open, the storage
policy will be used to select a different storage area
Version 1 Reservation
Storage Area Selection
A Storage Area is randomly selected from the Storage Policy, and set as the StorageArea property on the Document
Note that if the storage policy doesn’t include an area that can be used for new content, creation of the reservation will fail
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
37
Section 4 – Storage Area Resource Status
All Storage Areas have an associated Resource Status value that controls which content operations are allowed for the area, as seen in the table below.
Resource Status
Create New Content
Append Content
Delete Content
Retrieve Content
OPEN Yes Yes Yes Yes
CLOSED No No Yes Yes
STANDBY No Yes Yes Yes
FULL No No Yes Yes
Note that ‘Append Content’ means ‘add new content to an existing document that’s in the reservation state’.
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
38
Automatic Resource Status State Transitions
CEMP supports three types of automatic resource status transitions (the server automatically changes the resource status for an area).
From Status
To Status
Comments
OPEN FULL Occurs when adding a content element detects that the size/count exceeds the maximum allowed.
OPEN CLOSED Occurs when adding a content element detects that the closure date is in the past.
STANDBY OPEN Occurs when adding a content element detects that there are no OPEN areas in the storage policy, but there is a standby area in the policy.
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
39
Storage Area Counters and Limits
The following Storage Area properties are used to determine when to automatically transition the resource state of an area.
Property Name Description
ContentElementCount Number of content elements currently in the area.
ContentElementKBytes Total k-bytes currently used by all content elements in the area.
MaximumContentElements Maximum number of content elements allowed before the area is marked full.
MaximumSizeKBytes Maximum k-bytes (all content elements) allowed before the area is marked full.
ClosureDate Date when the area is to be marked closed.
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
40
Manual Resource Status State Change
The Resource Status may be manually changed, according to the following rules:
The resource status may be changed to CLOSED, FULL, or STANDBY without restriction, the counters/limits/closure date do not restrict changing ‘to’ any of those states.
A manual status change to OPEN (including from open to open) is allowed only if the size/count is below the limits and the closure date is in the future (or isn’t being used).
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
41
Section 5 – Fixed Storage Devices
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
42
Image Services Content Storage
CEMP Document
Content Element 0
Content Element 1
Doc ID
Element 0=Page 1
Element 1=Page 2
Referral Blob
Doc ID
Page 1 Content
Image Services Document
All content for a single CEMP document is stored
as a single Image Services document, identified by an
IS Doc ID
The content of each element is stored as a separate page of the IS document
Page 2 Content
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
43
SnapLock Content Storage
CEMP Document
Content Element 0
Content Element 1
File ID
Element 0 = File 1
Element 1 = File 2
Referral Blob
A GUID is generated for each CEMP document, that
GUID is used to form the unique file name of the
SnapLock file
The content of each element is stored as a separate file within the folder hierarchy
S0
Content File 2Content File 1
A SnapLock fixed device is a folder hierarchy on a
SnapLock volume
© 2009 IBM Corporation Confidential
Information Management software | Enterprise Content Management
44
Centera Content Storage
CEMP Document
Content Element 0
Content Element 1
C-Clip ID
Element 0 = Tag A
Element 1 = Tag B
Referral Blob
C-Clip ID
C-Clip Name
Tags
Content Blob A Content Blob B
Centera Object
All content for a single CEMP document is stored as a single Centera object,
identified by a C-Clip ID
The content of each element is stored as a separate content blob of a tag of the C-Clip