12
Copyright The TANEJA Group, Inc. 2012. All Rights Reserved. 1 of 12 87 Elm Street, Suite 900 Hopkinton, MA 01748 T: 508.435.2556 F: 508.435.2557 www.tanejagroup.com TECHNOLOGY IN BRIEF THE OBJECT EVOLUTION EMC OBJECT-BASED STORAGE FOR ACTIVE ARCHIVING AND APPLICATION DEVELOPMENT NOVEMBER 2012 A few years ago, object-based storage made a huge splash on-premise with the promise of meaningful data relationships, information accessibility and strong compliance. It remains an important component for information management based on compliance and single-tenant architectures. However, the evolution of object-based storage has big implications for the cloud and unstructured data: new approaches to active archiving, web/mobile application development and a changing model for cloud storage service providers. Object storage is optimal for the web. It has a very different architecture from file systems, which are frankly overkill for most cloud storage. On-premise can be a different story; having data close to hand under single-tenant access control is right for some data storage. But on-premise stored data requires that the enterprise maintain a primary data center, a cold data center for DR, replication, continuous data protection, and so on. Given the right set of needs this is a fine trade-off of course and we certainly do not counsel people to get rid of their internal data centers and redundant systems. However, cloud-based object architecture offers big benefits for storing unstructured data for active archiving, global access to data, fast application development and much lower cost compared to the high computing and data protection costs of on-premise NAS. EMC has engineered Atmos to provide these capabilities and many more as a massively scalable, distributed cloud-based system. In this Technology in Brief we will examine the fast-changing world of archiving and development on the web, and how object-based storage is the best way to go for these monumental tasks. When Object Trumps File The go-to architecture for unstructured data has traditionally been an application-centric system containing the operating system, the application, and a NAS filer using hierarchical file architecture. This infrastructure works acceptably well in a slow-growth, consistent workload setting; although even then it is far too easy to add complexity along with additional systems and filers. However, business needs have evolved far beyond this sleepy storage model. Unstructured data now comprises a massive portion of large data growth, and hierarchical file systems are difficult to optimize and scale. For example, file system-based storage requires near-constant provisioning. As storage requests grow (which they inevitably do), IT administrators must manually provision storage to meet the expanded requirements. Meanwhile, large volume and spiky workloads make provisioning both “up” and “down” an expensive and time-consuming proposition. And difficult provisioning is hardly the only problem: siloed data protection with individual backup, replication and archiving applications steadily raises OPEX. Scaling is an issue as well. Large critical big data applications may warrant scale-out or scale-up file systems (which are challenges in and of

The Object Evolution - EMC Object-Based Storage for Active Archiving and Application Development

Embed Size (px)

DESCRIPTION

This Technology in Brief, written by Taneja Group, examines the fast-changing world of archiving and development on the web, and how object-based storage for unstructured data provides benefits such as active archiving, global access, fast application development, and much lower cost compared to high computing and data protection costs of NAS.

Citation preview

Page 1: The Object Evolution - EMC Object-Based Storage for Active Archiving and Application Development

Copyright The TANEJA Group, Inc. 2012. All Rights Reserved. 1 of 12 87 Elm Street, Suite 900 Hopkinton, MA 01748 T: 508.435.2556 F: 508.435.2557 www.tanejagroup.com

TECHNOLOGY IN BRIEF

THE OBJECT EVOLUTION

EMC OBJECT-BASED STORAGE FOR ACTIVE ARCHIVING AND APPLICATION DEVELOPMENT

NOVEMBER 2012

A few years ago, object-based storage made a huge splash on-premise with the promise of meaningful data relationships, information accessibility and strong compliance. It remains an important component for information management based on compliance and single-tenant architectures. However, the evolution of object-based storage has big implications for the cloud and unstructured data:

new approaches to active archiving, web/mobile application development and a changing model for cloud storage service providers.

Object storage is optimal for the web. It has a very different architecture from file systems, which are frankly overkill for most cloud storage. On-premise can be a different story; having data close to hand under single-tenant access control is right for some data storage. But on-premise stored data requires that the enterprise maintain a primary data center, a cold data center for DR, replication, continuous data protection, and so on. Given the right set of needs this is a fine trade-off of course and we certainly do not counsel people to get rid of their internal data centers and redundant systems.

However, cloud-based object architecture offers big benefits for storing unstructured data for active archiving, global access to data, fast application development and much lower cost compared to the high computing and data protection costs of on-premise NAS. EMC has engineered Atmos to provide these capabilities and many more as a massively scalable, distributed cloud-based system. In this Technology in Brief we will examine the fast-changing world of archiving and development on the web, and how object-based storage is the best way to go for these monumental tasks.

When Object Trumps File

The go-to architecture for unstructured data has traditionally been an application-centric system containing the operating system, the application, and a NAS filer using hierarchical file architecture. This infrastructure works acceptably well in a slow-growth, consistent workload setting; although even then it is far too easy to add complexity along with additional systems and filers.

However, business needs have evolved far beyond this sleepy storage model. Unstructured data now comprises a massive portion of large data growth, and hierarchical file systems are difficult to optimize and scale. For example, file system-based storage requires near-constant provisioning. As storage requests grow (which they inevitably do), IT administrators must manually provision storage to meet the expanded requirements. Meanwhile, large volume and spiky workloads make provisioning both “up” and “down” an expensive and time-consuming proposition.

And difficult provisioning is hardly the only problem: siloed data protection with individual backup, replication and archiving applications steadily raises OPEX. Scaling is an issue as well. Large critical big data applications may warrant scale-out or scale-up file systems (which are challenges in and of

Page 2: The Object Evolution - EMC Object-Based Storage for Active Archiving and Application Development

Copyright The TANEJA Group, Inc. 2012. All Rights Reserved. 2 of 12 87 Elm Street, Suite 900 Hopkinton, MA 01748 T: 508.435.2556 F: 508.435.2557 www.tanejagroup.com

Technology in Brief

themselves). Most do not rate this architecture, and instead reside on poorly scalable systems. The number of these systems grows as applications come online, making it even harder for IT and application owners to administrate and for users to get the value from the application that they need. This already difficult scenario gets even worse when NAS storage is used for what is essentially a cloud use case, such as extending existing assets over the cloud.

Figure: Traditional NAS infrastructure 3

In contrast to hierarchical file system-based storage silos, object-based storage opens up a whole new range of dynamic functionality. Object-based storage assigns unique object IDs to access data across all federated locations. This goes a long way towards eliminating traditional, time-consuming storage management tasks like LUN creation and RAID groups. Active archives and applications needing fast global access particularly benefit from global namespaces and location transparency. The flat, universal namespace allows global access to stored content from anywhere the distributed application runs. Applications can also efficiently associate metadata with stored objects without using a dedicated database. Sharing vast storage resources means application administrators do not need to modify application files. Object-based storage usually has elements of file systems in order to handle processes like file archiving, but it is not founded on that architecture and its drawbacks.

Object-based storage originally developed as a type of specialized NAS storage where the hierarchical system was replaced with an object-oriented system that made file storage far more secure and scalable. One of its most popular incarnations is still going strong today: Content-Addressable Storage (CAS). A subset of object-oriented storage, CAS ensures there is only one ID for any object. When the CAS object is retrieved, it can be hashed again and checked against its ID to verify identity. CAS de-dupes at the object level for copy control.

Page 3: The Object Evolution - EMC Object-Based Storage for Active Archiving and Application Development

Copyright The TANEJA Group, Inc. 2012. All Rights Reserved. 3 of 12 87 Elm Street, Suite 900 Hopkinton, MA 01748 T: 508.435.2556 F: 508.435.2557 www.tanejagroup.com

Technology in Brief

TABLE: CONTRASTING FILE SYSTEMS WITH OBJECT STORAGE

Characteristic File Object

Metadata

File systems implement a centralized file layer metadata service that tracks directory structures, permissions, and on-disk locations of files. All file requests must access metadata first for permission and file information.

Object metadata is stored along with the object data to avoid metadata service bottlenecks. This ID may be used to also uniquely verify and validate the data being stored.

Namespace

File systems have built-in namespace constraints for files and directories they can store and manage. Hierarchical directory structures can become unwieldy, performing poorly at navigating large numbers of users or files.

Object storage provides a single flat namespace for objects. Replacing path and filenames with object identifiers makes the address space practically infinite with very fast performance for users and applications.

Interaction

File systems are designed to offer in-place editing and updating of files using sophisticated, yet highly complex, locking and synchronization mechanisms. These methods make it difficult to distribute or extend file systems across multiple locations.

Objects are inherently immutable once stored under a unique ID, and can be easily replicated and accessed globally. Programming for object storage leads to simpler, supportable, and more reliable programs.

Cloud Applications

File systems present a real challenge for cloud-based archival management and mobile application delivery. Poor scalability, lagging performance, and complex application development make traditional file systems a poor choice for compelling new cloud usages.

Object stores are simple, clean and quick to access. Since objects are easily distributed, replicated, and globally accessible in the cloud, they are ideal for active global archives and distributed mobile applications.

Object-based storage both on-premise and in the cloud require certain key capabilities. On-premise object storage has great benefits for local file storage including multiple application access, massive scaling, high availability; and in some architectures, information governance as well.

Multiple application access. Applications simultaneously leverage the same centralized object-based storage infrastructure. This enables local object-based storage to execute application-specific archiving management attributes for a complete chain of information custody.

Massive scaling. Massive scaling is problematical with file-based archive solutions. As the file system reaches its maximum capacity, administrators must expand the entire system’s operating system, file system and application in order to scale the archive. By contrast, object-

Page 4: The Object Evolution - EMC Object-Based Storage for Active Archiving and Application Development

Copyright The TANEJA Group, Inc. 2012. All Rights Reserved. 4 of 12 87 Elm Street, Suite 900 Hopkinton, MA 01748 T: 508.435.2556 F: 508.435.2557 www.tanejagroup.com

Technology in Brief

based storage can expand in an open fashion into multiple petabytes due to their flat address space.

High availability. Object storage often archives data that has heavy retention and government requirements. In this environment, 5 9’s or higher availability (99.999%) is a necessity. Mirroring and parity help to protect availability; other beneficial features include self-healing, detecting and fixing soft corruptions in the background, and addressing hardware failures before they impact data availability.

Information governance. A subset of object-based storage, Content-Addressable Storage (CAS) is purpose-built for long-term defensible retention of fixed files and data. As opposed to other archival storage methods like tape or monolithic “tar” files that bundle data up and/or move it offline, CAS stores data as objects that can be strictly and individually managed for governance and compliance and yet remain actively accessible on-line.

Best Practices: Object and the Cloud

We strongly support on-premise object storage such as CAS for local space savings, performance and information governance. However, we find that object storage is roaring to life in the cloud, where cloud-based active archiving and application development require highly distributed and single namespace storage for unstructured content. These critical usage cases benefit far more from object-based storage than they do from traditional file systems. Let’s look at best practices architectural features for object-based storage in the cloud.

DATA AND METADATA

When data is stored as an object, a unique object identifier is created out of a single universal global namespace. The object ID is retained by the client application and used to subsequently retrieve that object. Objects can effectively live anywhere in the cloud-wide system without the storage client needing to know about actual data locations, file system structures or LUN details. This provides a complete location transparency that serves to reduce intentional storage management and inherently supports globally distributed access by web and mobile applications.

Because of the location transparency provided by the object storage layer, objects can be automatically load-balanced across nodes, and replicated within and across sites without disrupting applications or users. Wide data distribution and federation can be managed through systematic policies to meet various service level goals for access, high availability, protection, cost and performance.

The object layer abstraction also provides a great benefit to applications that previously might have had to be intimately storage aware to avoid running out of space or had to otherwise actively manage data locations. Because applications written to leverage object storage don’t have to embed rules or code specific knowledge of storage infrastructure details, they avoid having to be re-written or re-architected for “changing” storage assignments as users spread, features expand, and data sets grow.

MULTI-TENANCY

Secure multi-tenancy is a key requirement of cloud object storage, which should support two levels of multi-tenancy: tenants and sub-tenants. Tenants are top-level entities that each has its own access points, security controls and master storage policies. Tenants share nothing with other tenants and are fully isolated. Every node gets assigned to a specific tenant; tenants do not share nodes and therefore each tenant has its own dedicated access points and storage. Within a large company, a tenant could be set up for independently managed divisions or subsidiaries. In a service provider implementation, the tenant might be mapped to a broad storage service offering.

Page 5: The Object Evolution - EMC Object-Based Storage for Active Archiving and Application Development

Copyright The TANEJA Group, Inc. 2012. All Rights Reserved. 5 of 12 87 Elm Street, Suite 900 Hopkinton, MA 01748 T: 508.435.2556 F: 508.435.2557 www.tanejagroup.com

Technology in Brief

Sub-tenants are then created within each tenant with security controls and defined management policies assigned by the tenant. Each sub-tenancy defines a distinct storage environment with isolated management for its own users, object namespace, and defined shares. A sub-tenant within a company might correspond to a department, while a storage provider's sub-tenant might track to a specific client account.

This highly functional multi-tenancy capability makes it easy to create private sandboxes or implement a global content delivery scheme. With some planning, this scheme could enable large corporations to facilitate aggregating “big data” distributed across the enterprise.

ACCESS FROM ANYWHERE

As a cloud object storage service with a flat global namespace, an object can be accessed through any site (although for performance, policies might strive to replicate objects to sites closer to where they will be read). In addition, object storage for the cloud must present a broad range of access methods including both web services and traditional file services.

REST (and SOAP) web services are key APIs. REST is the most common cloud storage access method for browser and custom mobile applications. REST as a protocol over HTTP was designed to optimize web-style remote access to “resources”, and is an ideal match to object storage where each object can be easily treated as a REST resource.

Figure: Typical cloud-based object storage deployment

POLICY DRIVEN MANAGEMENT

A key benefit of object storage is the ability to use metadata to drive automatic data management policies. Policies should support service levels, and should be triggered when data objects are created, objects hit certain ages, or upon metadata updates. Policies can control data protection operations including the number, type and target locations for replicas, inherent storage features for striping, compression and de-duplication, retention locks and automatic deletion, and shifting objects into different policies over time.

Page 6: The Object Evolution - EMC Object-Based Storage for Active Archiving and Application Development

Copyright The TANEJA Group, Inc. 2012. All Rights Reserved. 6 of 12 87 Elm Street, Suite 900 Hopkinton, MA 01748 T: 508.435.2556 F: 508.435.2557 www.tanejagroup.com

Technology in Brief

The policy mechanism should be highly flexible, targeting policies to any group of objects based on both system and user defined metadata. Policies can be used to build service levels by defining the amount of replication, implement archive rules for compliance, and optimize capacity and performance as items age.

Primary Object Use Cases in the Cloud

Cloud-based archiving, particularly medical and file archiving, forms the primary use case for object-based storage. Web application development is surging forward, and Archive-as-a-Service and its providers round out the fastest-growing use cases.

PRIMARY USE CASE: ACTIVE ARCHIVES

Archived information is playing a more strategic role in workflows and business processes. On-premise archiving is essentially static and used to reduce storage costs, improve operational efficiency, retention and compliance, and enable the business to use archived data to make better business decisions. Cloud-based archiving retains elements of these features but adds new dynamic ones: instant access from any device, archive as a service and federating to private or public cloud. Atmos provides both the static and dynamic features that massive active archives require.

Federate to public or private clouds. Federation enables companies to treat on-premise and cloud object storage as a single efficient infrastructure. Companies may pool distributed storage assets including data, applications and policies to take full advantage of the cloud’s massive scalability and global access features. Federation also lowers cost and risk: application workloads run on cloud resources with a low execution cost, and if a cloud-based storage system goes down the distributed workload remains protected. Federation extends internal policies to cloud-based storage environments by applying existing policies and settings to cloud-based storage.

Use metadata to drive business and storage decisions. We expect the use of metadata to expand quickly to directly feed business exploitation processes, as well as support more automatic and intelligent storage management decisions. A singly managed distributed system that maintains directly accessible object metadata yields rich support for business decisions. Object-based storage also enables IT to automate information lifecycle management across the entire distributed data store, not just by storage silo. Policies should be flexible enough to be set at the object, tenant or system levels, to automate archive decisions, set and manage retention, expiration, and disposition.

Multi-tenancy for secure shared storage. Multiple applications can safely co-exist as separate tenants. Isolation by tenant protects security while enabling the sharing of system-wide resources and capacity. Multi-tenancy is also efficient since it is subscribed to a highly scalable pool of storage, which can flexibly up-scale and down-scale on demand.

Massive scalability. Unstructured data storage is growing so fast that traditional storage systems are straining purchase, maintenance and management resources to the brink. Distributed object-based architecture yields near-limitless scale. Object also allows for automatic load balancing whenever new objects are stored, which protects high performance across the entire distributed system.

Multi-site active/active. Multi-site active/active architecture is an important component of object-based storage, especially in the cloud. Cloud object storage systems span multiple sites and provide for multi-site direct access to objects through both synchronous and asynchronous replications. This model replicates between multiple storage nodes and sites, which not only increases distributed availability and content distribution, but also supports disaster recovery.

Page 7: The Object Evolution - EMC Object-Based Storage for Active Archiving and Application Development

Copyright The TANEJA Group, Inc. 2012. All Rights Reserved. 7 of 12 87 Elm Street, Suite 900 Hopkinton, MA 01748 T: 508.435.2556 F: 508.435.2557 www.tanejagroup.com

Technology in Brief

Archive-as-a-service. The most agile and flexible way for IT to deliver archive services is with the cloud model of self-service portals. This model manages and meters utilization and bandwidth and supports third-party chargeback. Within an enterprise this flexibility and instant storage relieves users of the temptation of using commercial cloud services simply because they can get the storage they need fast – even though security might not be in place. This approach also enables ISVs and MSPs to extend archive requirements and offerings.

Reduce manual tasks and provisioning across multiple archives. Cloud-based archives must be easy to set-up and for reliability and consistency must not require long or deep manual configuration. They should also automate underlying complexities including security, audit, retention, performance, and capacity growth. Atmos provides these features and more, relieving the cloud administrator of enormous burdens. Distributed systems may be managed as a single entity with policies to automate hundreds of management and data protection tasks. And perhaps the most important of all, object-based systems like Atmos offer massive scalability of capacity and performance thanks to their unique architecture.

FAST-GROWING USE CASE: WEB AND MOBILE APPLICATION DEVELOPMENT

Web and mobile applications development using unstructured data also has driving needs that object-based cloud storage meets. Web application development requires quick access to storage resources, test/dev environments capable of storing multiple copies of large data sets, and the ability to test web applications in real-time online environments. These requirements are understandably hard to achieve in traditional using file-based storage systems.

Applications written to leverage object storage won’t need to be rewritten or even taken offline as the object storage seamlessly (or elastically) expands over time. Atmos provides the key capabilities that web application development require, including location transparency, self-managing storage and REST APIs.

Enable instant access to data from any device. Web and mobile applications are inherently geographically distributed, yet file systems are usually limited in both effective access points (location) and number of files that they can manage. Object-based storage abstracts its storage from physical locations, providing a secure access point in place of device-specific mount points. Web services APIs and file-based access allow approved users to easily access their archives from computers and a broad array of mobile devices. Integrated web services over REST and SOAP are key to this instant access. Other support components are file-based access (CIFS / NFS / IFS / CAS), and expanded access via ISV applications.

Self-managing storage. In traditional development, applications have often been hard-coded to specific data stores through pointers to identified LUN’s or file system navigation paths. In contrast, object storage provides a clean mapping from application to data through a simple REST API with an immutable unique object ID to the stored object. This goes a long way towards eliminating traditional, time-consuming storage management tasks like LUN creation and RAID groups. Cloud owners may choose to extend self-management options to customers, making it simple for users to grow storage capacity on demand.

Broad API support. Cloud object storage is basically shared storage accessed through web-based services. Atmos’ architecture supports rapid web application development with a broad API set including REST and S3. REST API leverages HTTP operations on objects that are directly addressed, which reduces code complexity and provides the kind of easy, automatically distributed, protected, persistent storage the developer needs. In addition to the REST API, EMC Atmos also natively supports the Amazon S3 API. This provides customers with the ability to simply point S3 applications to Atmos and seamlessly migrate their applications to any of the more than 40 Atmos powered public clouds around the globe.

Page 8: The Object Evolution - EMC Object-Based Storage for Active Archiving and Application Development

Copyright The TANEJA Group, Inc. 2012. All Rights Reserved. 8 of 12 87 Elm Street, Suite 900 Hopkinton, MA 01748 T: 508.435.2556 F: 508.435.2557 www.tanejagroup.com

Technology in Brief

EMC and Object-Based Storage

EMC first introduced Centera CAS for archiving in 2002. Centera offers 5 9’s data availability with its redundant array of independent nodes (RAIN) is interconnected via cube switches, protecting data across independent nodes in a cube. Mirroring and parity provide additional protection and availability.

Centera’s CAS architecture keeps the retained data from being compromised or deleted before the end of its retention period. Centera assigns unique hash-code identifiers specific to each unique object including content elements, metadata, and data/metadata relationships. This inextricably links content elements with their metadata, which are stored within a flat address space – no need for a separate database. This architecture ensures authenticity of the archived objects. Centera abstracts the unique objects from their generating applications and operating systems, which enables Centera to flexibly act as the single, highly optimized data store for previously siloed archives.

Centera retains single instances of archived objects. In the case of multiple users of the same file – such as a PowerPoint file sent over a distribution list – Centera retains metadata with information about each user’s interaction with the file, but points to the single instance of the object. By cutting down on data copies, this results in dramatic reductions in the quantity of archive storage.

Centera searches using metadata, rather than opening up the content objects on application-specific storage. This results in much faster and more efficient searches without using application cycles. This is possible because content and metadata stored on Centera is application, file and operating system independent; and Centera offers is a search engine right in its repository.

Centera’s content-based addressing integrates directly with application environments via APIs, with no need for kernel level dependencies. This means that multiple applications can simultaneously use Centera, and that specific archiving management attributes – such as data aging and data protection -- can be executed per application. These capabilities create a complete chain of custody once the data leaves the primary application to be archived on Centera. Media independence also leverages Centera’s application support. Centera objects are independent of specific storage media and protocols, which means that the storage system can migrate to new storage media over time without disturbing the integrity of the archived objects. For long term disk-based archiving, this represents significant risk mitigation and investment protection.

Centera architecture is highly scalable and self-managing. Traditional file systems scale based on the amount of stored data versus remaining available address space – which may not be much. As the file system reaches its maximum capacity, administrators must expand the entire file system including operating system, file system, and application in order to scale the archive. In contrast, Centera expands to petabyte-high capacities due to their flat address space. It also leverages its architecture to distribute management controls across the entire archive infrastructure. For example, if a Centera disk or node fails, the archive cluster knows how to self heal without manual intervention. This distributed management structure extends to cover the deployment, scaling, recovery and protection of all the archival objects being stored by Centera.

Centera optimizes archiving, information governance and compliance. Users may choose from 300 native, integrated archiving applications to manage archival needs for email, files, medical imaging, content management, video, voice, and more on the single Centera archiving platform. In addition, Centera offers Compliance Edition Plus for compliance and eDiscovery, and Governance Edition for data retention management.

Page 9: The Object Evolution - EMC Object-Based Storage for Active Archiving and Application Development

Copyright The TANEJA Group, Inc. 2012. All Rights Reserved. 9 of 12 87 Elm Street, Suite 900 Hopkinton, MA 01748 T: 508.435.2556 F: 508.435.2557 www.tanejagroup.com

Technology in Brief

Centera Compliance Edition Plus captures and preserves original content, protecting data and proving chain of custody for legal eDiscovery and litigation. Retention classes assign a logical reference to each electronic record object; policies enforce data retention and safe disposition. Centera Governance Edition enforces internal policies for data retention and disposition. Policies may be organizational or application-specific, which improves corporate accountability, reduces the cost of eDiscovery and compliance, and proves the integrity of governance controls.

To the Cloud: Atmos Architecture

EMC’s Atmos supports the same CAS API as Centera for seamless migration, and brings object storage into the cloud with massive scalability and geographic federation supported with multi-tenancy, cloud provisioning and global access features. While Atmos is readily leveraged to extend active global archives, it also offers an exceptional platform for web and mobile application development. Atmos even enables new opportunities for global “big” data aggregation and distribution.

Atmos is at heart a software storage system for building private and public cloud storage. Atmos implementations are available from EMC either already integrated into pre-packaged physical building blocks or as a virtual machine solution for VMware vSphere that can leverage other EMC or 3rd party storage resources. Additionally, there is a rich ecosystem of service providers providing Atmos as cloud Storage-as-a-Service directly. Any and all of these options can be federated together as needed within and across a given organization.

EMC uses REST and SOAP web services, and has also implemented file services on top of Atmos to serve underlying objects through the lens of either an NFS or CIFS file server. When NFS or CIFS shares are defined, they are assigned to specific Atmos nodes (or dedicated pairs for HA) and utilize the Atmos node’s inherent Linux capabilities (leveraging an Installable File System with the FUSE extension). Layering a file system over Atmos imposes some constraints regarding universal access, but also enables both traditional and transitional applications and file system type usage.

EMC Atmos Windows and Linux users can also leverage the EMC GeoDrive add-on that installs on a single user workstation or server to provide remote virtual NFS/CIFS style access (over REST) to Atmos object storage. GeoDrive supports local caching of files for offline use and eventual synchronization on reconnection. One of the major benefits of GeoDrive is enabling a user to access large amounts of protected storage from anywhere. It can also be used for the disaster recovery of files pushed or mirrored into Atmos.

Atmos technically maintains a given piece of data as an object with associated metadata that includes the object ID, system and user-defined metadata fields and the internal object layout information (and parent/child information for objects saved through a file system “namespace” interface). Applications and users can store arbitrary metadata with each object that can be leveraged by group management policies. Policies can be created at the tenant level as a design scheme to provide various service levels of performance access, and data protection based on some awareness of the multi-site architecture of the cloud implementation. They are then assigned to subtenants, who need to not be aware of the underlying implementation, to apply as target service levels to their objects. For example, the power to explicitly enforce compression of image files (e.g. jpegs) after a number of days would present a significant capacity optimization for a web-based application dealing with millions of images.

In addition to supporting compliance and retention policies, metadata can be used to drive automated file distribution, access control and data protection activities optimizing for the appropriate level of data resiliency, performance and availability. For most applications, thoughtful use of user metadata can remove any need to implement a separate management tracking database for stored objects.

Page 10: The Object Evolution - EMC Object-Based Storage for Active Archiving and Application Development

Copyright The TANEJA Group, Inc. 2012. All Rights Reserved. 10 of 12 87 Elm Street, Suite 900 Hopkinton, MA 01748 T: 508.435.2556 F: 508.435.2557 www.tanejagroup.com

Technology in Brief

Replication is controlled by automated policies which can mirror data objects at many points in an object’s lifecycle both within and across multiple sites. Within a data center site, replication might for example be set to happen synchronously upon ingestion while between replication between sites might be set asynchronously and launched with an arbitrary delay to allow for data settling. Replications can be targeted to specific locations, or abstractly sent to “other” sites as the system decides.

For performance and availability, replicas are all active for read access (objects are inherently immutable so there is no issue with having to manage distributed locking mechanisms). Because it is “multi-site active/active”, any site can fulfill new object write requests when the local primary site is unavailable.

In addition to full replication, EMC also provides an erasure coding option called GeoParity. Instead of keeping two or more full 100% copies, “9/12” erasure coding enables storing an “expanded” object containing only 33% additional encoded “redundant” data broken up into 12 segments. By using erasure coding, the original data can be reconstructed dynamically from any 9 of the segments. These segments are cleverly distributed so that the object can survive (and even be accessed during) multiple failures. For greater protection there is also a “10/16” coding with a 60% capacity overhead. Erasure coding does impact access performance, especially at ingestion, but provides great fault tolerance with much lower capacity utilization. Of course, policies can be written to convert replicated objects to erasure coded schemes as they age appropriately.

With object stores there is generally no need for low-level RAID or disk level protection and Atmos is no exception. Upon hardware failures, replications and/or GeoParity across nodes (RAIN) combined with built-in node auto-healing features suffice to provide the full data protection as determined by the service level “policies” implemented for each type of data object. Atmos can withstand the loss of any disk, node, rack, or even site.

Atmos Pre-built Hardware Configurations

EMC Atmos pre-configured hardware “appliances” consists of a rack/cabinet containing from 4 to 16 Atmos nodes in various configurations and disk capacities. Flexible configurations enable smooth scalability, and allow for mixes of capacity and performance in and across Atmos sites. An Atmos storage node consists of a 1GbE server front-end running the Atmos storage services connected to one or more SAS attached disk array enclosures (DAE), each containing 15 1-3TB 7200RPM disks. Every node runs all object storage services (the first two nodes in each site also run the site metadata locator service that indexes which node contains which objects) supporting tremendous horizontal system scalability.

EMC has also introduced their new Atmos G3 series for new levels of density and energy efficiency. G3-Dense-480 is the first in the Atmos G3 series and consists of 4, 6, or 8 nodes with 480 disks in 40U, and 3TB drives.

TABLE: ALIGNING TOP CLOUD USE CASES WITH EMC ATMOS

Use case Challenge Benefits

Medical Archiving

Over 800 million medical imaging procedures a year require huge storage scalability; collaboration and compliance increase complexity.

Vendor Neutral Archive (VNA) on Atmos: integrates with EMR/EHR and improves PACs for better patient care and collaboration, improves data lifecycle management, reduces IT costs, and preserves HIPAA compliance.

File Archiving Corporate file sharing is popular with With EMC Sync & Share, users can securely

Page 11: The Object Evolution - EMC Object-Based Storage for Active Archiving and Application Development

Copyright The TANEJA Group, Inc. 2012. All Rights Reserved. 11 of 12 87 Elm Street, Suite 900 Hopkinton, MA 01748 T: 508.435.2556 F: 508.435.2557 www.tanejagroup.com

Technology in Brief

employees but syncing and sharing are hard to manage. Employees will frequently share files anyway over mobile devices, leaving corporations accountable for risky behavior.

share Atmos files across mobile devices, Linux and Windows. GeoDrive creates a Dropbox-like service that is secure and manageable, powered by Atmos’ fast performance. Atmos policies monitor changes to data and provide access control, benefitting regulated verticals like finance.

Archive as a Service

Both the enterprise and storage service providers struggle to provide IT services to their respective customers. Provisioning, maintenance, and security are all difficult issues in traditional storage offerings.

The Atmos Cloud Delivery Platform enables corporations and service providers to meter capacity, bandwidth, and usage across tenants. Provisioning is automated by tenant, and Atmos allows tenants to safely self-manage and access their own storage.

Managed Service Providers

Many MSPs suffer from narrow profit margins because of the expense of delivering storage to customers. Managing multiple tenants, manual provisioning and maintaining service level agreements all cut into revenue and make it too expensive to add new storage services.

Atmos lets MSPs efficiently offer storage as a service and better monetize new service offerings. MSPs can monitor capacity and usage for chargeback, reduce provisioning costs, and replace multiple tenant manage-ment systems with a single system. Dynamic scaling, high availability and security cost-effectively meet service level requirements.

Content-Rich Web Applications

Traditional storage is a poor environment for Web application development, which needs highly scalable capacity for multiple large data sets, a secure environment for test/dev and application testing in real-time environments.

Atmos provides location transparency for global applications and a highly mobile user base. The single namespace means that application developers never need to recode pathnames and locations, and do not need to code for limited storage environments. Self-management options make it easy for customers to provision their own storage, and REST APIs reduce application complexity.

Taneja Group Opinion

When on-premise archive solutions smoothly integrate with federated storage, then public and private clouds provide extensive scalability and global availability. Yet we see too many end-users treating the cloud as just another storage tier for low value retained data. This is a huge waste of cloud possibilities but we understand why it happens: cloud platforms with poor performance and delivery mechanisms can make cloud-based storage more trouble than it’s worth.

But when we talk about EMC Atmos we are not talking about a low-cost storage tier, far from it. We are describing the heart of business innovation based on highly secure and highly accessible global data stores. EMC’s long expertise with object-based storage has kept Centera relevant and has extended dynamic data management to the cloud with Atmos. The Atmos-fueled cloud replaces hierarchical file storage while allowing the secure flow of information between the data center, the distributed cloud, and global access points. Customers profit from greatly improved application and data delivery, and the deep business value inherent in their valuable data.

Page 12: The Object Evolution - EMC Object-Based Storage for Active Archiving and Application Development

Copyright The TANEJA Group, Inc. 2012. All Rights Reserved. 12 of 12 87 Elm Street, Suite 900 Hopkinton, MA 01748 T: 508.435.2556 F: 508.435.2557 www.tanejagroup.com

Technology in Brief

When a company is dealing with geographic reach and large growing volumes of rich content, then they should look to object-based storage in the cloud. We fully support EMC in its push to scale capacity, performance, availability and management far beyond what traditional file systems are capable of, and more massively than ever before.

.NOTICE: The information and product recommendations made by Taneja Group are based upon public information and sources and may also include personal opinions both of Taneja Group and others, all of which we believe to be accurate and reliable. However, as market conditions change and not within our control, the information and recommendations are made without warranty of any kind. All product names used and mentioned herein are the trademarks of their respective owners. Taneja Group, Inc. assumes no responsibility or liability for any damages whatsoever (including incidental, consequential or otherwise), caused by your use of, or reliance upon, the information and recommendations presented herein, nor for any inadvertent errors that may appear in this document.