Upload
dinhthuan
View
237
Download
1
Embed Size (px)
Citation preview
© 2012 IBM Corporation
The IBM view on storage archive solutions: requirements to solve and trends for the future 31st ADLUG ANNUAL MEETING - Firenze, September 19-21st
IBM Systems and Technology Group
Marco CeresoliData Protection and Retention Sales LeaderIBM Europe
© 2012 IBM Corporation
IBM Systems and Technology Group
2
Agenda
The growth and the variety of digital information The shift of market dynamics and trends for ArchivingTechnologies for data archiving: comparisonNew trends: Linear Tape File System value propositionRole and history of IBM in Tape technologyCase studies and conclusions
© 2012 IBM Corporation
IBM Systems and Technology Group
3
Storage is growing… and not only in terms of capacity
Growth
Digital
Universe
2005150 ExaByte
(150 millions TB)
Growth
Digital
Universe
2011
1.800 ExaByte
(1,8 billionsTB)
• Volumes• Variety
•Velocity
Source: 2011 IDC Digital Universe Study
© 2012 IBM Corporation
IBM Systems and Technology Group
4
Every day 15 PetaBytes of new information in digital format are created
80% of this new data is unstructured generated mainly by email, documents, images, video and audio.
EFFECTS…A company with 1,000 employees spend on
average 5,3M$ every year to search for information which is difficult to find.
42% of managers say that they utilize INCORRECT information at least once a week.
During 2007 in the USA there were 37.000 security breaches (cyber attacks) with an increment of 158% versus 2006.
More than 20.000 laws at global level require not only pure storage capacity but classification and Information lifecycle management.
Information Week, “State Of Enterprise Storage Changing Priorities, Changing Practices”, 2009.
© 2012 IBM Corporation
IBM Systems and Technology Group
5
Smarter Systems Are Creating an Information Explosion
2005 2006 2007 2008 2009 2010 20110
200
400
600
800
1,000
1,200
1,400
1,600
1,800
Exab
ytes
RFID,Digital TV,
MP3 players,Digital cameras,
Camera phones, VoIP,Medical imaging, Laptops,
smart meters, multi-player games,Satellite images, GPS, ATMs, Scanners,
Sensors, Digital radio, DLP theaters, Telematics ,Peer-to-peer, Email, Instant messaging, Videoconferencing,
CAD/CAM, Toys, Industrial machines, Security systems, Appliances
Storage requirements growing 20-40% per year
Source:: Semantics, “Linked Data” guidelines, 2006.
© 2012 IBM Corporation
IBM Systems and Technology Group
6
Changing Market Dynamics & Trends Value has Shifted toward Archiving Software– Shift from Hardware to Archiving Software for addressing compliance, data retention management
and lifecycle governance requirements– Email archiving and eDiscovery adding additional content types
Information Lifecycle Governance is needed– Clients understand they can no longer address data growth issues by adding more storage
Backup as Archive– Significant proportion (over 50%) of customers continue to use backups as archive copies for long
term retention
Industry Specific Archives– Healthcare & Life Sciences requirements for archival of Medical Images and Electronic Medical
Records– Government, Oil & Gas, and other industries demanding solutions specific to their needs– Cross-Industry requirements also rising (e.g., Compliance, retaining Surveillance data for long
periods of time) Cloud Based Archiving
– Hosted offerings replaced by clouds (e.g., for eDiscovery)– Shift in deployment models from ‘siloed’ on-premise installations to consolidated solutions, archive
as a service, and cloud archiving
© 2012 IBM Corporation
IBM Systems and Technology Group
7
Significant growth expected in Digital Archiving Archival (Tier 3) data is:
– Fastest growing at 65% CAGR– Stored on Disk, Tape, and Optical Media– (Not captured in Tape IDC or GMV forecasts) Graph illustrates Active and Deep Archiving combined
© 2012 IBM Corporation
IBM Systems and Technology Group
8
Why store data for long-term, and how?
Why I need to store for a long time?– Cultural and scientific vale – Value for the company
– More than 22.000 norms/laws at worldwide level to rule the data preservation
How to store this data?– Multi-level storage infrastructure
with different costs– Data reduction (compression and
data deduplication)– Automatic data management
based on archiving rules– Virtualization and independence
from the storage infrastructure – “anywhere” and “self-service”
accessibility cloud-oriented– Focus on storing documents and
data interconnections (metadata) together
© 2012 IBM Corporation
IBM Systems and Technology Group
9
What to archive and how much time?
Which data needs to be stored? How long to store?
Source: SNIA – 100 Year Archive Requirements SurveySource: ESG - Requested Record Types During Electronic Discovery Processes
© 2012 IBM Corporation
IBM Systems and Technology Group
10
You Might Think “Archiving” Means any of These…
Archive -- a long-term collection of data that typically is fixed-content data; i.e., no I/O writes are allowed to change the data.
Deep archiving – The original definition of archiving, whereby production data is written to another set of storage media (typically tape) and moved offsite while the original version is deleted (typically from disk).
Active archiving – Data for which frequency of access is active rather than inactive, while frequency of updating is nonexistent so the data is fixed (i.e., is unchanging) and not subject to I/O writes that could change the data.
Long-term archiving – Active archived data for which the frequency of access has fallen so low that a tier of more cost-effective storage may be an appropriate place to house the data.
Backup – a dated (i.e., specified-time) duplication of a designated set of data from a data source on one set of media (typically disk) to a backup set of media (either disk or tape)
Vaulting – Typically, the movement of data on tapes from a target site to a protected remote site.
Source of these definitions: Data Protection, David Hill, 2009, CRC Press
© 2012 IBM Corporation
IBM Systems and Technology Group
11
Major Archive Segments
Unstructured Data (files)What? MS office, SharePoint, contracts,
images, etc. Why? Reduce storage growth, offer a
service or product, improve performance, lower cost, Compliance
Available products? IBM Content Collector, FileNet, Content Manager, etc.
eMail archiving, eDiscovery . What? email, but any other data type
potentially too Why? Litigation support, Compliance Available products? IBM Content Collector
with IBM disk storage
Structured Data (database archiving) What? Relational tables, rows, periodic
reports, retire applications Why? Reduce storage growth, improve
performance, lower cost, Compliance (reports)
Available products? IBM Optim with IBM disk storage
Unstructured Data (kept from birth)What? Medical Images, “Content” (M&E),
DVS, Seismic shots, ScientificWhy? Reduce storage growth, offer a
service or product, improve performance, lower cost
Available products? VAD Medical Archive solution or or LTFS/tape with an ISV app
© 2012 IBM Corporation
IBM Systems and Technology Group
12
Technologies for data archiving and preservation Fault tolerance: redundancy, ECC, RAID(*), ... Data protection: “space-efficient” internal replication Disaster recovery: automated remote data replication Data immutability: NENR(*) e WORM(*)
Archiving and preservation rules: API(*) and standard interfaces Cost reduction: storage tiering, WORN(*)
Data growth reduction: data deduplication and data compression Data security: data encryption and data shredding Access control: tamper protection, audit logs, ...
(*) ECC = Error Correction Code,RAID = Redundant Array of Independent Disk, NENR = Non Erasable Non Rewritable, WORM = Write Once Read Many, API = Application Program Interface, WORN = Write Once Read Never
More than 50 years of continuous innovationMore than 50 years of continuous innovation
© 2012 IBM Corporation
IBM Systems and Technology Group
13
Storage management at 360°: archiving, backup, migration, DR
Enterpriseclass
Mid-range Low-cost
NENR WORM NENR
AutomatedOff-line
ManualOff-line
Archiving andILM management
backupcopies
Migration to newtechnologies
Disaster protection
NENR/WORMstorage
Compression?De-duplication?
Encryption?
The processes can be automated and repeatedThe processes can be automated and repeated
© 2012 IBM Corporation
IBM Systems and Technology Group
14
The IBM Smart Archive strategy
Collaborative(Quickr, SharePoint)
ERP / CRM …(SAP, PeopleSoft …)
Data
Content(Documents, Images …)
Reports
Paper Email(Notes, Exchange)
On Premise(Custom Config)
Appliance(Pre-Config)
As A Service(SaaS, Cloud Storage)
Cloud Ready Archive Storage with
Optional ECM
Value Added Services• Optimization Services• System Services• Managed Services• Reference Architecture• Information Governance
Optimized and Unified Assessment, Collection and Classification
Flexible and Secure Infrastructure with Unified Retention and Protection
Integrated Compliance, Records Management, Analytics and eDiscovery
© 2012 IBM Corporation
IBM Systems and Technology Group
15
Long term data archiving: Total Cost
From: “In Search of the Long-Term Archiving Solution - Tape Delivers Significant TCO Advantage over Disk”, The Clipper Group, Dec.23, 2010.
© 2012 IBM Corporation
IBM Systems and Technology Group
16
Long term data archiving: TCO and technology evolution
From: “In Search of the Long-Term Archiving Solution - Tape Delivers Significant TCO Advantage over Disk”, The Clipper Group, Dec.23, 2010.
© 2012 IBM Corporation
IBM Systems and Technology Group
17
Tape Advantages for Archiving/Long-Term PreservationTape Disk
Source: Tape The Digital Curator of the Information Age. By Fred Moore, President, Horison, Inc.
© 2012 IBM Corporation
IBM Systems and Technology Group
18
The annual rate of areal density increases for TAPE will likely exceed the annual rate of areal density increases for NAND and HDD
– TAPE bit cell is large and paths for scaling to higher bit densities exist– NAND bit cells and HDD Patterned Media bit cells are approaching nanoscale issues in minimum
feature lithography requirements– NAND bit endurance or bit retention and HDD bit stability are approaching
A Possible Annual Areal Density Growth Scenarios
– 20% for HDD – 20% to 30% for NAND Flash– 40% to 80% for TAPE
Implications for Storage: TAPE, NAND, and HDD will continue to
offer complementary storage solutions Implications for TAPE: TAPE volumetric
density will increase, enhancing its cost advantages
Technology Roadmap Comparisons for TAPE, HDD, and NAND Flash Outline : Implications for Data Storage Applications
© 2012 IBM Corporation
IBM Systems and Technology Group
19
Annual Areal Density Growth Rate Scenarios
HDD – 20% to 25% – Transition to New Technology, Sensor Output, Lithography NAND Flash – 25% to 30% – Lithography and Endurance TAPE – 40% to 80% -- No Lithography Issues, Mechanical Realities
2002 2004 2006 2008 2010 2012 2014 2016 2018
10000
1000
100
10
1
0.1
AR
EAL
DEN
SITY
(Gbi
t/in²
)
YEAR
HDD ProductsNAND ProductsTAPE Products
40%/yr
NAND
HDD
TAPE
40%/yr
20%/yr
80%/yr
40%/yr
© 2012 IBM Corporation
IBM Systems and Technology Group
20
Cost evolution of the magnetic storage
Source: IBM elaboration and Information Storage Industry Consortium (INSIC) – 2008
SSDSSD~6-10X~6-10X
© 2012 IBM Corporation
IBM Systems and Technology Group
21
Magnetic Tape The cheaper storage support of the hierarchy Most used for long term archiving purposes LTO (Linear Tape Open) standard: Fifth generation available today with 1,5TB cartridges (3TB
compressed) January 2010: the IBM Zurich Research Laboratory performed a technology demonstration of a
35TB cartridge(1) . Today they are working on a technology demo of a 100TB cartridge.
http://lto.org/technology/roadmap.html
(1) http://www.ibm.com/press/us/en/pressrelease/29245.wss
© 2012 IBM Corporation
IBM Systems and Technology Group
22
Rich Media Driving New Storage Requirements
Characteristics of data stored is changing– Mix of traditional business data (ie. transactional, docs,
email, databases, and backup of those assets) vs “rich media” (ie. video, images, digitized content, etc) is rapidly changing
© 2011 IBM Corporation3
• IBM logo must not be moved, added to, or altered inany way.
• Background shouldnot be modified,
• Title/subtitle/confidentiality line: 10pt Arial Regular, whiteMaximum length: 1 line
Information separated by vertical strokes,with two spaces on either side
• Slide heading:28pt Arial Regular, blue R120 | G137 | B251
Maximum length: 2 lines
• Slide body:18pt Arial Regular, black
Square bullet color:teal R045 | G182 | B179
Recommended maximum text length: 5 principal points
• Group name:14pt Arial Regular, white
Maximum length: 1 line
• Copyright: 10pt ArialRegular, white
Optional slide number: 10pt Arial Bold, white
Template release: Oct 02For the latest, go to http://w3.ibm.com/ibm/presentations
Indications in green = Live content
Indications in white = Edit in master
Indications in blue = Locked elements
Indications in black = Optional elements
IBM and BP Internal Use
Smarter Systems Are Creating an Information ExplosionEspecially in Media and Entertainment (M&E)
2005 2006 2007 2008 2009 2010 20110
200
400
600
800
1,000
1,200
1,400
1,600
1,800
Exab
ytes RFID,
Digital TV,MP3 players,
Digital cameras,Camera phones, VoIP,
Medical imaging, Laptops,smart meters, multi-player games,
Satellite images, GPS, ATMs, Scanners,Sensors, Digital radio, DLP theaters, Telematics,
Peer-to-peer, Email, Instant messaging, Videoconferencing,CAD/CAM, Toys, Industrial machines, Security systems, Appliances
Storage requirements growing 20-40% per year
Source:: Semantics, “Linked Data” guidelines, 2006.
Video, images, etc. a major factor driving growth
Access & asset management profiles of rich media are significantly different from traditional business data
– Much of traditional business data stored is a cost centerRegulatory, compliance, disaster recovery for business critical data and
processes– Rich media is primarily stored for monetization purposes
Production archives and asset protection Repurposing content and distributionLong term archives to monetize assets
– BW changes everythingaccess to/from content, business motivation to make content available
Eg. Key to M&E industry move to digital workflows
© 2012 IBM Corporation
IBM Systems and Technology Group
23
Self-Describing cartridge– Remove requirement to commit long term to tape software application– Content protection in event of database corruption or loss
Improve content interchange/distribution– Eliminate need for common tape software across enterprise and/or interchange locations– Reduce cost of data interchange
Partial Recall– Eliminate time penalty in moving large video content to tape in event of need small part of video
content (ie. Goal in game) Improved Tier management of content
– Ease complexity in movement from Tier 1 (disk) to Tier 2 (online tape) and Tier 3 (archive)– Improve data import/export to system management
$/GB, Power– Reduce cost of digital storage – power and $/min
Open Standards– Large diverse infrastructure requires open standard – Standard/support of MXF video
Long Term Content Archive Life– Archive life desire for 50-100 years
Elements to address new role of TAPE
© 2012 IBM Corporation
IBM Systems and Technology Group
24
LTFS Value Proposition
Digital archives need and want the Value Proposition of Tape:– $/GB – lowest cost storage– Watt/GB – green storage– Portability – ability to manage archive outside system– Scalability – easy to add additional storage (ie. buy cartridge)– Investment protection – LTO has an 8 generation roadmap (up to a 32TB cartridge (compr.))
But - Inhibitors to use tape:– Proprietary tape applications require long term commitment and support of tape application to
maintain archive– Non-self describing data formats requiring centralized archive database to recover content on
individual tapes– Import/export & distribution of tapes in archive is difficult due to proprietary tape applications
Solution: LTFS addresses the inhibitors and unlocks value proposition of tape for digital archives– Open, non-proprietary tape format– Self-describing data structure on cartridge– File system support on Linux, Mac, Windows provides:– Distribution and cross platform interchange– Enables transition to integrated file based tape/disk storage systems
© 2012 IBM Corporation
IBM Systems and Technology Group
25
Introduction to LTFS (Linear Tape File System)
IBM Linear Tape File System is:1. Open Format for data which is written to tape
Describes the format of data and meta data stored on tapeMeta data is based on XML schemaDeveloped and disclosed by IBMApplicable to LTO-5 and Jag-4
Requires tape partitioning
2. File System support (code) to R/W tapes in LTFS format externalizes the LTO-5 tape as file system
Enables standard applications to write/read LTFS tapesSupports update, edit, delete of files on LTFS tape Supports partial recall
Available on Linux, Mac OS X and Windows
Engineering EMMYAward – Oct 2011
© 2012 IBM Corporation
IBM Systems and Technology Group
26
Logical View of LTFS Volume
BOT
EOT
Index Partition
Data Partition
Guard WrapsLTFS Index XML
File File File
File
LTFS utilizes media partitioning (new to LTO Gen 5 and Jag 4) The tape is logically divided “lengthwise”
• (think C: & D: drives on single hard disk unit)
LTFS places the index on one partition and data on the other
© 2012 IBM Corporation
IBM Systems and Technology Group
27
IBM : 60 Years of Tape Innovation
2010TS7610
TS7680
2008TS1130(3592 G3)
1984IBM 34801st cartridge drive
1964IBM 21041st read/back drive
1959IBM 7291st read/write drive
1952IBM 7261st magnetic tape drive
20033592 Gen1
1995IBM 3590
1999IBM 3590E
2005TS1120(3592 G2)
2004LTO Gen3
2002LTO Gen2
2000LTO Gen1
2007LTO Gen4
1962IBM Tractor System
1992IBM 3495
1997VTS G1
2000TS3500
1994IBM 3494
1999VTS G2
2001VTS G3
2006TS7740 (VTS Gen 4)
2005TS7510 VTL
2007TS7520
2007TS3400
2005TS3200TS3300
2007TS7530
2008TS2900TS3500High Density
2008TS7720
2008TS7650G
2009TS7650Appliance
2008TS1130(3592 G3)
1984IBM 34801st cartridge drive
1964IBM 21041st read/back drive
1959IBM 7291st read/write drive
1952IBM 7261st magnetic tape drive
20033592 Gen1
1995IBM 3590
1999IBM 3590E
2005TS1120(3592 G2)
2004LTO Gen3
2002LTO Gen2
2000LTO Gen1
2007LTO Gen4
In tape automation and virtualization1992IBM 3495
1997VTS G1
2000TS3500
1994IBM 3494
1999VTS G2
2001VTS G3
2006TS7740 (VTS Gen 4)
2005TS7510 VTL
2007TS7520
2007TS3400
2005TS3200TS3300
2007TS7530
2008TS2900TS3500High Density
2008TS7720
2008TS7650G
19743850 MSS
2009TS7650Appliance
In tape drive technology2010LTO Gen5
2011TS1140
(3592 G4)
2011TS3500
Connector & Shuttle
2011TS7740
TS7720
© 2012 IBM Corporation
IBM Systems and Technology Group
28
LTO Roadmap
http://ultrium.com/technology/roadmap.html
© 2012 IBM Corporation
IBM Systems and Technology Group
29
And data deduplication is the key to using more disk more cost effectively!
© 2012 IBM Corporation
IBM Systems and Technology Group
30
Scalable Capacity and Performance
Better Performance
Larger Capacity
Scalable
Up to 500 MB/secUp to 500 MB/sec
7 TB to 36 TB 7 TB to 36 TB Useable CapacityUseable Capacity
IBM ProtecTIER® Deduplication Family
Highest PerformanceHighest PerformanceLargest CapacityLargest CapacityHigh AvailabilityHigh Availability
Up to 2800 MB/secUp to 2800 MB/sec
Up to 1 PB Up to 1 PB Useable CapacityUseable Capacity
TS7650G & TS7680 TS7650G & TS7680 ProtecTIER ProtecTIER GatewaysGatewaysTS7650 TS7650
ProtecTIER ProtecTIER AppliancesAppliances
TS7620 TS7620 ProtecTIER ProtecTIER Appliance Appliance ExpressExpress
Up to 145 MB/secUp to 145 MB/sec
5.5 TB and 11 TB 5.5 TB and 11 TB
Useable CapacityUseable Capacity
Good PerformanceGood Performance
Entry LevelEntry Level
Easy to InstallEasy to Install
© 2012 IBM Corporation
IBM Systems and Technology Group
31
• During year 2000 IBM and KB projected and implemented a digital data preservation system called DIAS (Digital Information Archiving System).
• DIAS is the solution for the archiving and preservation of the multimedia and electronic digital-format documents.
• DIAS is compliant to the OAIS(1) standards related to the “logical” and “physical preservation”.
• IBM realized the DIAS solution using standard software components of general usage: WebSphere, DB2, Tivoli Storage Manager and Content Manager. IBM DIAS - Digital Information Archiving System
Koninklijke BibliotheekNational Library of the Netherlands
Ingest
Preservation
DataManagement Access
Archival Storage
Delivery&
Capture
Packaging&
Delivery
Administration Monitoring & Logging
Data Data
Query
AIP
SIP DIP
SIP DIP
AIP
(1) OAIS: http://public.ccsds.org/publications/archive/650x0b1.pdf Koninklijke Bibliotheek: http://www.kb.nl/dnp/e-depot/e-depot-en.html
© 2012 IBM Corporation
IBM Systems and Technology Group
32
Ecosystem: Thought Equity MotionSports Video Archiving in the Cloud
Challenges• Low cost delivery platform for enterprise scale Video
Supply Chain as a Service • Information growth of ~100 TB per month• Easy self-serve access required by clients
Solution• IBM LTFS at several global locations, including some client
facilities • IBM System Storage® TS3200 Tape Library, LTO®-5 tape
drives
Benefits• Opened up new business opportunities• Enabled more predictable and transparent pricing for
clients• Portable, interoperable, scalable, cost-effective data
protection and long-term storage
‘LTO 5 and LTFS significantly reduce the ancillary costs around storage. This is a real game-changer from IBM’
Mark LemmonsCTO, Thought Equity Motion
TSP03327-USEN-00TEM with LTFS on Youtube: TEM with LTFS on Youtube: http://www.youtube.com/watch?v=M7w0jrkQnj4
© 2012 IBM Corporation
IBM Systems and Technology Group
33
Thank you for your attention!