Diamond Elite Funded Head Count Agmt

Embed Size (px)

Citation preview

  • 8/2/2019 Diamond Elite Funded Head Count Agmt

    1/3

    [1]

    STORAGE SWITZERLAND

    THE BIG DATA ARCHIVE

    Big Data is often thought of as a specialized use case

    involving machine generated data, typically associated

    with web search logs, satellite imagery or other sensor

    data, on which analytics are performed to enable some

    sort of decision support application. While this is an

    important example of a Big Data project, another is the

    collection of human generated data that also needs to be

    retained, organized and made readily available for data

    mining or compliance reasons. An archive is needed to

    store both machine and human generated data and a Big

    Data storage solution can be the ideal solution.

    Human generated data is essentially file data created from

    office productivity applications. These files are the

    contracts, designs, proposals, video, audio, images and

    analytical summary data that drives the organization. Also

    included in this category would be data files generated by

    multi-media tools such as video camcorders, mobile

    phone cameras, notebook PC microphones, etc. Just like

    machine generated data this data has value and apotentially higher compliance requirement, at least from a

    litigation perspective. But unlike machine generated data

    which often goes straight to low-cost archival storage, this

    human generated file data is frequently stored on

    expensive, high performance primary storage over its full

    lifespan.

    At the point of creation and during early modification these

    human generated files can justify being stored on the more

    expensive primary storage location where rapid access is

    not only essential but expected. Over time though, most

    file based data rapidly loses its need for immediacy and

    could be more appropriately stored on something cost

    effective but maybe not as responsive as primary storage.

    However, unlike old copies of a database, which need to

    remain somewhat accessible for either mining and

    compliance or just to ease the minds of users, these older

    files can be put onto a secondary tier of storage.

    This secondary storage area is an ideal use case for a disk

    archive tier, something designed specifically to store this

    type of data cost effectively. Again, this data will be

    retained because the organization has to, but also

    because it wants to. Companies will mine this data to

    provide insight to support future decisions. This mining

    requirement means that archiving alone is not enough for

    the organization. They need all the capabilities that a BigData Storage Infrastructure needs, but in a more capacity-

    centric form.

    George Crump, Senior Analyst

  • 8/2/2019 Diamond Elite Funded Head Count Agmt

    2/3

    [2]

    The Value of a Big Data Archive

    A Big Data Archive brings three specific value areas to the

    enterprise. First, similar to a classic archive, it should allow

    for the reduction of primary storage consumption and

    support growth. According to studies, well over 80% of file

    data on most primary storage systems is not in active use

    and is therefor wasting this high cost resources performance

    abilities. If this data was moved to a secondary, high capacity

    storage area, but still one with moderate performance, most

    additional file access could be done without impact to the

    users. This could have a significant, positive impact to the IT

    budget.

    Secondary disk tiers have been available for years, as have

    software products to classify and move that data. The cost

    savings on primary storage alone motivated many users to

    move to a two-tier storage infrastructure. But many otherdata centers were not so inclined. Big Data Archive brings

    three more motivation points to the equation that should

    interest all data centers to adopt this multi-tier approach to

    storage.

    The first point is that organizations are beginning to

    understand the value of this data and to acknowledge theres

    a real desire to retain, categorize and in the future mine, this

    information to help make better business decisions or speed

    product development. They are coming to realize that

    archiving makes practical sense in the data center and itsshortcomings are being eliminated by Big Data storage

    architectures.

    The second motivational point is the need for compliance.

    Organizations and litigators are beginning to understand that

    retention is more than just making sure email is saved or that

    it can be found (discovered). Retention means keeping all the

    files that exist in relation to a case as well. In the past this

    meant providing boxes of paper documents. Today most

    documents are digital and are never printed. Retaining

    electronic documents is not only important it may be the only

    evidence of that information that can be retained.

    Finally a Big Data Archive is complimentary and may even be

    part of what was previously considered a separate project.

    This makes the cost to add a Big Data Archive to a current

    big data project minimal or may allow it to be the

    foundational component in a future big data project. In short

    by leveraging both initiatives, costs can be contained and

    ROI be realized sooner.

    As a result a Big Data Archive has unique requirements that a

    simple second tier of storage, or even a basic archive

    solution, typically cannot meet. Whether from machine or

    human generated data sources, a Big Data Archive must

    match the compliance capabilities of disk archiving while

    meeting requirements like dense scaling, high throughput

    and fast retrieval.

    Requirements For The Big Data Archive

    Density Scaling

    Legacy second tier disk systems and even archive systems

    both have scaling issues when measured against the BigData Archive challenge. The requirement to scale to

    Petabytes is now the starting point for many of these

    systems. This quickly eliminates single box architectures.

    Even legacy scale out storage architectures may not be

    suitable for the Big Data Archive challenge. These systems

    were designed to add nodes rapidly and as a result their

    capacity per node is limited and they quickly consume

    available data center floor space. The modern Big Data

    Archive will need a very dense architecture to maximize

    capacity on a per node basis and not waste that floor space.In these environments storage (disk drives) has practically

    become less expensive than the sheet metal (the other

    components in each node) that surrounds them. Thus,

    making it critical to use each node to its full potential before

    adding another.

    High Throughput

    Big Data Archives must also have the ability to ingest large

    amounts of data quickly. Legacy archive solutions were

    designed to have data trickle into them over the course of

    time. Big Data Archives may store very large numbers of

    different sized files on an ongoing basis. There can be

    millions of small files that are being archived from traditional

    Big Data project or a relatively few, very large rich media files

    being archived from user projects.

  • 8/2/2019 Diamond Elite Funded Head Count Agmt

    3/3

    [3]

    In both cases the ingestion of these files requires that the

    receiving nodes encode the data and then segment it to the

    other nodes in the cluster. This background work could

    cripple legacy archive solutions whose nodes are typically

    interconnected via a 1GbE infrastructure. Instead, a higher

    speed backbone is required so that additional throughput

    can be maintained. Solutions like Isilons NL Scale Out NAS

    connect via an internal Infiniband backbone for very highthroughput performance, enabling them to sustain ingest

    rates that match the requirements of a Big Data Archive.

    Fast Retrieval

    Retrieval is also different for the Big Data Archive than it is

    for the traditional archive storage system. It may need to

    produce thousands or millions of files very quickly or in

    some cases it may be desirable to actually perform the

    search and analysis on the Big Data Archive itself.

    Traditional archive architectures and legacy second tier

    storage systems are typically found lacking when asked to

    provide data quickly as the capacity scales over 1 PB. Its

    important to remember that archive systems were designed

    to provide performance better than the platform they were

    replacing, which for most was optical disk.

    Big Data Archives operate against a different standard. They

    need to provide consistent performance thats comparable

    to most primary storage systems, no matter what thecapacity level. Again, Isilons NL series surpasses this

    expectation and provides near primary storage performance

    but with the throughput and density that Big Data Archiving

    requires.

    Protection & Disaster Recovery

    Protecting 1PB+ environments requires a change in thinking.

    Nightly backups are no longer a reality, not only because of

    the size of the solution but also because of the amount of

    data that can be ingested at any time. If a large archive job

    is submitted, and then later a catastrophic failure occurs, a

    significant amount of data could be permanently lost. For

    example in the case of machine generated sensor data there

    may be no way to ever recover it.

    Data protection needs to be integrated and then augmented

    into the Big Data Archive. First the system should have nosingle points of failure and the users should be able to set

    the data protection level by data type. This would

    accommodate unrecoverable data, like point-in-time sensor

    data, which might need a higher level of redundancy than

    traditional file data.

    Next, the data needs to be transferred in real time to a

    second location via built-in replications tools. That data

    again needs be prioritized based on whether it can be

    replaced.

    Finally, there are always some organizations that will want to

    move to an alternate device all together, even tape, in case

    of a regional disaster. The Big Data Archive should have the

    ability to add copy out performance when needed. As an

    example Isilon can add a class of nodes to their cluster

    called, backup accelerators, that are specifically designed

    to move data to another device. This allows the other nodes

    to continue to deliver high throughput and fast retrieval while

    the cluster gets its data copied to alternate storage devices.

    Summary

    The Big Data Archive can be a component of a larger Big

    Data project or it can be an archive designed specifically for

    Big Data. In either case leveraging that investment to also

    include human generated data that needs to be stored for

    mining or compliance reasons is an excellent way to achieve

    a greater ROI on the Big Data project. It can also help

    discover new ways to make better decisions by retaining

    and analyzing existing information.

    About Storage Switzerland

    Storage Switzerland is an analyst firm focused on the virtualization and storage marketplaces. For more informatio

    please visit our web site: http://www.storage-switzerland.com

    Copyright 2011 Storage Switzerland, Inc. - All rights reserved

    http://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://localhost/Users/johndoe/Desktop/Cloud_Storage_s_Weakness(3).dochttp://www.storage-switzerland.com/http://www.storage-switzerland.com/http://www.isilon.com/nl-serieshttp://www.isilon.com/nl-series