27
EMC Proven Professional Knowledge Sharing 2010 Storage, a Cloud storm or not? In a time when everyone talks about Cloud computing, everyone seems to forget why we should – or should not – move our data to the Cloud. Roy Mikes Roy Mikes Storage and Virtualization Architect Mondriaan Zorggroep [email protected]

Storage, a Cloud storm or not? - Education Services Home, · services – also known as "The Cloud." The phrase "The Cloud" is quickly gaining recognition but there are still serious

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Storage, a Cloud storm or not? - Education Services Home, · services – also known as "The Cloud." The phrase "The Cloud" is quickly gaining recognition but there are still serious

EMC Proven Professional Knowledge Sharing 2010

Storage, a Cloud storm or not?In a time when everyone talks about Cloud computing, everyone seems to forget why we should – or should not – move our data to the Cloud. Roy Mikes

Roy MikesStorage and Virtualization ArchitectMondriaan [email protected]

Page 2: Storage, a Cloud storm or not? - Education Services Home, · services – also known as "The Cloud." The phrase "The Cloud" is quickly gaining recognition but there are still serious

2010 EMC Proven Professional Knowledge Sharing 2

Table of Contents

About This Document ............................................................................................................... 3

Who Should Read This Document? ......................................................................................... 3

Introduction ............................................................................................................................... 4

1. Data Growth .................................................................................................................... 5

1.1. What Kind Of Data ................................................................................................. 6

1.2. Data Growth Challenges ........................................................................................ 8

1.3. Why Control Data Growth? .................................................................................... 9

1.4. Data Growth ‘Considerations’ ................................................................................. 9

2. Key requirements .......................................................................................................... 15

2.1. Availability ......................................................................................................... 15

2.2. Capacity ............................................................................................................ 16

2.3. Performance ..................................................................................................... 17

2.4. Security ............................................................................................................. 19

2.5. Scalability .......................................................................................................... 20

2.6. Data Integrity .................................................................................................... 23

2.7. Manageability .................................................................................................... 23

3. What To Do Next?......................................................................................................... 24

4. Conclusion .................................................................................................................... 26

References ............................................................................................................................. 27

Disclaimer: The views, processes, or methodologies published in this compilation are those of the author. They do not necessarily reflect EMC Corporation’s views, processes, or methodologies.

Page 3: Storage, a Cloud storm or not? - Education Services Home, · services – also known as "The Cloud." The phrase "The Cloud" is quickly gaining recognition but there are still serious

2010 EMC Proven Professional Knowledge Sharing 3

About This Document

There has never been a greater need for information than the last decade – especially the last three years. The problem does not end here, because challenging market conditions, security, data growth, new technologies are leading many companies to re-evaluate the way they purchase, deploy, manage, and use business applications. Despite the title of this article, it is not just about ‘Cloud Services’. This article will help you understand the different sides of the many choices you will face, and help you make wise decisions when dealing with data growth in relation to different considerations and solutions. This article presents considerations to think about major functional areas of key requirements such as availability, capacity, performance, scalability, and data integrity in relation to Cloud services or other solutions. The objective is to keep all data manageable and secure, whether you put it in ‘the Cloud’ or keep it in your own data center.

Who Should Read This Document? This article is written for IT professionals who are responsible for defining the strategic direction of data growth in their data center(s). These include:

Storage Administrators Operational, middle level managers Business Managers IT managers (Chief Information Officer)

Organizations and individuals who have the same interests should read this article as well. My intent is to provide general guidelines that are easy to read. To meet this goal, the article covers the essentials and does not go too deeply into bits and bytes.

Page 4: Storage, a Cloud storm or not? - Education Services Home, · services – also known as "The Cloud." The phrase "The Cloud" is quickly gaining recognition but there are still serious

2010 EMC Proven Professional Knowledge Sharing 4

Introduction

Cloud computing is a technology whereby software and IT infrastructure are delivered as services – also known as "The Cloud." The phrase "The Cloud" is quickly gaining recognition but there are still serious concerns whether you keep your data in a private or public Cloud. This article provides users with an understanding of Cloud Storage and how it compares to other storage offerings in the marketplace. In this article I will look at a number of factors including what types of data, workloads, access patterns, and security are a good fit for Cloud Storage.

In the past year and half, I've worked intensively with two new EMC CLARiiON® systems redesigning Backup and Recovery, moving inactive data to other Tier layers, implementing Archiving, introducing new NAS Storage, and upgrading SAN Storage. In the meantime, I completed three storage courses. Heavy? Yes! So believe me, it was an eventful year. But it has paid off because, finally, Cloud Storage has hit the management agenda.

Let's face it, we cannot ignore data growth anno 2009 - 2011 and beyond. If you think storing your enterprise data is a tough challenge now, it's nothing compared to what it might be in just a few years. According to a recent study from research firm IDC[1] and storage vendor EMC, data requirements are growing at an annual rate of 60 percent – 281 Exabyte’s total in 2011. These findings should serve as a wake-up call to enterprises.

There is hope. More and more, customers and vendors realize how serious this problem will become. There are numerous approaches to help you understand the benefits and concerns of processing more data at a lower cost.

Page 5: Storage, a Cloud storm or not? - Education Services Home, · services – also known as "The Cloud." The phrase "The Cloud" is quickly gaining recognition but there are still serious

2010 EMC Proven Professional Knowledge Sharing 5

1. Data Growth

According to IDC[2], the digital universe in 2007 was 10% bigger than IDC initially projected. The cause of resizing comes as a result of faster growth in cameras, digital and High Definition TV, and better understanding of information replication.

By 2011, the digital universe will be 10 times the size it was in 2006. As forecast, the amount of information created, captured, or replicated exceeded available storage for the first time in 2007. Not all information created and transmitted gets stored, but by 2011, almost half of the digital universe will not have a permanent home. Fast-growing corners of the digital universe include those related to digital and High Definition TV, surveillance cameras, Internet access, sensor-based applications, data centers supporting “Cloud computing,” and social networks. The diversity of the digital universe can be seen in the variability of file sizes, from 6 gigabyte movies on DVD to 128-bit signals from RFID tags. Because of the growth of VoIP, sensors, and RFID, the number of electronic information “containers” files, images, packets, tag contents is growing 50% faster than the number of gigabytes. The information created in 2011 will be contained in more than 20 quadrillion – 20 million billion of such containers, a tremendous management challenge for both businesses and consumers.

While hardware and applications that create or capture digital information are growing rapidly, so is the hardware that store information. Information creation and available storage are the yin and yang of the digital universe. Cheaper storage allows us to take high-resolution photos on our cell phones, which in turn drives demand for more storage. Higher-capacity drives allow us to replicate more information, which drives growth of content. Yin, yang.

Page 6: Storage, a Cloud storm or not? - Education Services Home, · services – also known as "The Cloud." The phrase "The Cloud" is quickly gaining recognition but there are still serious

2010 EMC Proven Professional Knowledge Sharing 6

1.1. What Kind of Data I’m not going to tell you about Facebook, taking pictures, making VoIP phone calls, uploading videos to YouTube, downloading digital content, and so on. These are typical things you’re using at home. Let’s talk about business data. You can simplify it in two categories; structured and un-structured. When I think of business data, I think of X-ray’s and medical databases, E-mail/attachments, images, forms, instant messaging, documents, audio and video recording, CTTV and so on. But anno 2010 you see home and business data increasingly merge together. In 2007, the digital universe contained 281,000,000,000 gigabytes, which works out to about 45 gigabytes per person on the planet. Assuming an average growth of 60%, this will grow to 185 gigabytes per person on the planet in 2010. When we back up all this data we can almost multiply it.

Shown in the below graph you can see that most of the data used is unstructured. Data that

falls in the ‘unstructured’ category are:

- X-ray’s

- E-mail/attachments

- Images

- Forms

- Documents

- Instant messaging

- Audio and Video

- Rich media

- PDF’s

- Etc.

Data that falls in the ‘structured’ category is:

- Mainly Databases

Data created by individuals or businesses must be stored so that it is easily accessible. In a computer environment, devices designed for storing data are termed ‘storage devices’ or simply ‘Storage’. Businesses have several options available for storing data. It depends on the importance of data. Certain data are more important than other. It’s not up to you to decide, but your organization. Depending on the importance of the data, there are other rules concerning this data. You must treat data differently.

Page 7: Storage, a Cloud storm or not? - Education Services Home, · services – also known as "The Cloud." The phrase "The Cloud" is quickly gaining recognition but there are still serious

2010 EMC Proven Professional Knowledge Sharing 7

As shown in the graph above you can see that most of the data is unstructured. This is normal behavior because there are more pictures, files, e-mails, audio, and video files than databases. It’s important to know what kind of data you have, and make it more concrete as shown in the graph below.

The graph shows an example of data classification. This data is raw data. Raw data is provided by the LUNs that are allocated to the underlying systems, such as HP-UX and VMware systems. Further, this graph shows you the different categories, divided into mail, EPD, VMware, hot spares, and not in use. But it could be any type of data in your data center. In this example, EPD is a database. 7% is structured and 66% is unstructured. 27% is still free and will be divided in one of the two categories. This is similar to the overall picture where 20% of the total is structured and 80% is unstructured. Of cause, you’re splitting data in categories or product name for an overview. The lesson to be learned is that by knowing what data you have, it is easier to make decisions. Maybe you want to put some data in a public or local Cloud. You know exactly how much you gain in a particular category!

Page 8: Storage, a Cloud storm or not? - Education Services Home, · services – also known as "The Cloud." The phrase "The Cloud" is quickly gaining recognition but there are still serious

2010 EMC Proven Professional Knowledge Sharing 8

1.2. Data Growth Challenges Storage ... Isn't that something you can buy in your local computer store or a discount "Shopping Mall"? It sounds like a joke. But believe me; I have heard it more than several times. Infact, every time, I have to defend why we need more disks or need more space. But to be honest, these are proper questions that have to be answered. It’s probably about a lot of money. Data is created by the applications and is processed to become information. Information is undoubtably an organization’s most important asset. Organizations want their information managed with a consistent approach so that they are able to store, protect, optimize, and leverage data for enhancing the value of information. Your customer doesn't care whether the array is still working or needs more space; all they care about is having access to their data and their service availability. So ultimately, vendor availability is one of many concerns which are important in the larger scheme of things. When I go back to my local computer store or a discount "Shopping Mall". Oh my god it sounds like an echo. Explain to your manager why he does not want to buy storage in a discount "Shopping Mall". Try to convince your manager that there are hundreds of people working at the same time and using the same disks and disk space. Refer to Exchange or your Fileserver(s) as an example. You have to deal with availability, as mentioned earlier. Presumably, you also want no data loss. Maybe put your disks in a RAID, to protect your data. Perhaps you’ll want to use local or even remote replication through mirroring your data between more than one data center in case of a disaster. This requires some technical complexity. So, what are your reasons that you need high-quality disks? . An important one is data integrity. Data integrity refers to mechanisms such as error correction codes or parity bits which ensure that data is written to disks exactly as it was received. Any variation in data during its retrieval implies corruption, which may affect the operations of the organization. Why should you purchase durable storage? This question is also relevant for Cloud Storage. These facts – or Key requirements [3] of information Storage Manageability – indicate that there is something that has to be done. Let’s put them all in a row

Availability

Capacity

Performance

Security

Scalability

Data Integrity

Manageability

Page 9: Storage, a Cloud storm or not? - Education Services Home, · services – also known as "The Cloud." The phrase "The Cloud" is quickly gaining recognition but there are still serious

2010 EMC Proven Professional Knowledge Sharing 9

1.3. Why Control Data Growth?

When it comes to data, more isn’t always better. Overgrown databases can impair the performance of your mission-critical business applications and custom applications, jeopardizing the superior service you’ve worked so hard to provide.

There are many ways to provide solutions. The easiest way is to add more disks to meet your capacity demands. But this isn’t really a solution; it only confirms that you cannot remove any more data. You have, typically, exhausted all means of controlling/minimizing your data growth. So now what?

This is when you need to manage your data growth. Plan for the future to control where you go. Either that, or at least provide enough time to plan for a new career elsewhere.

1.4. Data Growth ‘Considerations’ Where to start? While this is a complex matter, there is a path you can follow that isn’t that difficult to execute. Start at the beginning and make it not too difficult. Here are six (of course there are more) considerations to follow:

1. Perform some type of capacity planning to see how fast you are growing and how quickly you will outgrow what you currently have.

2. Start tracking data ‘growth’.

3. Solve the data growth problem at the source - by managing your enterprise

application data.

4. Consider as much of the data deletion/purging activities as possible.

5. If growth is unavoidable, use tiering to lower your costs.

6. Cloud Services?

If you must make a choice in the near future whether to place some or all of your data in a public or private Cloud, the next page will give you more details about these six considerations.

Page 10: Storage, a Cloud storm or not? - Education Services Home, · services – also known as "The Cloud." The phrase "The Cloud" is quickly gaining recognition but there are still serious

2010 EMC Proven Professional Knowledge Sharing 10

1. PERFORM SOME TYPE OF CAPACITY PLANNING TO SEE HOW FAST YOU ARE GROWING AND HOW QUICKLY YOU WILL OUTGROW WHAT YOU CURRENTLY HAVE.

Today's storage managers must see beyond the bytes to understand the nature, attributes, and value of the data itself. Nothing is more important than knowing where you stand when it comes to storage resource forecasting. But as important as it is, it's just as important to stay flexible. There are always outside influences, such as regulatory compliance, mergers and acquisitions, and general economic cycles, which can greatly impact your storage resource needs. Be prepared to have some storage to spare. Just enough to absorb blows. I use the 20% rule through all systems such as Storage LUN’s, Vmfs, Windows, or Linux environment. Example: Let’s assume I have a Windows Server running on a VMware ESX host with the following configuration.

- C drive 10 GB (vmdk) running OS - D drive 20 GB (vmdk) running application - R drive 50 GB (RDM) running database

This server uses 80 GB in total disk space on several platforms; within the virtual machine, on vmfs, and directly on storage. As you can see, I have a C drive within the virtual machine of 10 GB. Let’s say it has 3 GB of space remaining. Your IT department probably has regular maintenance windows where they do some application updates or major Microsoft updates. Suddenly, your space left is 1.75 GB. You probably have more than one server in your data center, so multiply this with the other servers where this is applicable. Let’s stay by this one! We have to add more disk space to the virtual machine. This has direct consequences for the vmfs volumes. Let us assume that this expansion directly affects the vmfs and goes also through the 20% barrier. We have to expand this one, too. I guess I don’t have to explain how this affects storage in combination with extra disks as well. I assume you understand that it is not just the Microsoft updates, but applications as well. It will become a challenge to explain to your manager why you need the extra budget for disks for just updating a couple of Microsoft updates. Hmm… love to see you doing that. I must repeat; nothing is more important than knowing where you stand when it comes to storage resource forecasting. Catalog your disk size on all systems.

Page 11: Storage, a Cloud storm or not? - Education Services Home, · services – also known as "The Cloud." The phrase "The Cloud" is quickly gaining recognition but there are still serious

2010 EMC Proven Professional Knowledge Sharing 11

2. START TRACKING DATA ‘GROWTH’. Tracking data growth is important because if you don't know who owns data, you don't know who is responsible for it. You simply don’t know who is going to manage it. Once administrators understand the data, know who owns it, and see how the data is being used, they can make smart capacity planning decisions about enterprise storage and data protection. Who owns data? Finance, marketing, sales, the service department? Don’t be fooled into thinking this is easy. When it comes to deciding who actually owns the data, you will see that it is not easy. Suddenly, no one seems to be responsible.

According to Philip Howard[4], the business can't decide who owns the data. And assuming that you do need someone to be responsible for the data, you are left with a word puzzle: In most organizations there aren't any structures that support ownership of data. You are left with only one solution of this issue; create one! There needs to be data ownership.

The business is responsible for process governance. IT is responsible for systems governance. This structure in the middle is for data governance.

However, this isn't the end of the story because there are a couple of additional points. IT owns its own data. IT understands and owns SLA metrics, database performance management details, and so on. It also understands and owns its own processes. Therefore, IT governance encapsulates both process and data governance as they pertain to IT itself. I don't feel that any separate structures are needed within IT unless the IT department is very large.

The second additional point is with respect to the owner of the application. Here, the position is different. It is usually quite clear who owns content, because it is originated by a particular department. In this case, it’s obvious who owns the data.

Life in storage IT can be easier when you know who owns data. This action should be on your action list.

3. SOLVE THE DATA GROWTH PROBLEM AT THE SOURCE - BY MANAGING YOUR ENTERPRISE

APPLICATION DATA.

Like the first two considerations above, solving the data growth problem at the source is an outcome of both. My recommendation is to ‘Know your products’. Know what is does, know how to deal with it. Make sure there’s someone responsible for the application and data. Who may make an application for software? Discuss the roadmap, discuss how to go forward from one point in time to another.

It is therefore important that there is and will continue to be good management. Many IT departments behave reactively, responding to events as they occur - for example, suddenly needing more space. Because unscheduled work tasks take priority over planned work,

Page 12: Storage, a Cloud storm or not? - Education Services Home, · services – also known as "The Cloud." The phrase "The Cloud" is quickly gaining recognition but there are still serious

2010 EMC Proven Professional Knowledge Sharing 12

staffs find that they are always behind. By implementing ITIL, an organization can overcome this cycle and get clear insight into the total cost of ownership (TCO) and activities in the IT department.

ITIL is not just a technical thing. Actually it isn't really technical at all - not like Java programming anyway! ITIL is ultimately concerned with aligning IT with the business - that means that the business is ultimately the driver, not IT.

ITIL is a good driver and motivator to challenge the business to follow procedures. Prevention is better than cure. I repeat; solve the data growth problem at the source.

4. CONSIDER AS MUCH OF THE DATA DELETION/PURGING ACTIVITIES AS POSSIBLE. Optimizing information flow in your business begins with a strategy for creating, organizing, and disposing your information assets. Data storage requirements continue to increase, especially by many new applications, professional services (e.g. diversions), and organizational development. The increase should, however, be tempered by rigorous and effective policies. A possibility is to impose quotas on databases, file servers, mail servers, and Microsoft SharePoint servers. While these servers store large amounts of data and information, in most cases not ALL users are responsible for data growth. Usually, the "heavy users" are those responsible for growth of data and information. In a healthy business, growth is normal behavior, a necessary evil. In my opinion, imposing quotas would be an obstacle for your business anno 2010. Quotas do not prevent users from storing data elsewhere, such as locally or wherever possible. If you decide to use quotas, use it organization-wide. I believe that you have to discourage users and make them aware of the impact to your organization when there is no justification for continued (unlimited growth) data growth. Make the owner aware of the data growth and that costs are charged by megabytes or gigabytes. Believe me, once they have seen the bill, they will be aware.

Along with ever-increasing data growth, there is one more problem to classify. The most urgent problem will be the backup window. As data grows, so do backup windows. As a result, the main backup process will still be running during office hours, with a range of effects. But what if there is a strong need for information? Would you have to impose

quotas? These considerations lead to different roads. You can think of deleting ‘old’ data or redesign your backup by using Snapshots or deduplication. What about bringing it to ‘the Cloud’? Let us briefly discuss these three.

Page 13: Storage, a Cloud storm or not? - Education Services Home, · services – also known as "The Cloud." The phrase "The Cloud" is quickly gaining recognition but there are still serious

2010 EMC Proven Professional Knowledge Sharing 13

When we speak of deleting ‘old’ data, I meant what is old data? When will it be old data? Should we delete it? Often there are legal rules you must obey and deleting data is not an option. In a way, we speak of Life Cycle Management. A storage snapshot is a set of markers, or pointers, to data stored on a disk drive, on a tape, or in a storage area network (SAN). A snapshot is something like a detailed table of contents, but it is treated by the computer as a complete data backup. Snapshots streamline access to stored data and can speed up the process of data recovery. Still, snapshots come at a cost as additional disk space is required. Keep this in mind! Data deduplication is a method of reducing storage needs by eliminating redundant data. Only one unique instance of the data is actually retained on storage media, such as disk or tape. Redundant data is replaced with a pointer to the unique data copy. This could save a lot of money. But, still, the deduplication intelligence you buy is more expensive than adding regular disks. The breakeven point is around five Terabytes of data. Is “Cloud Services” just the latest ‘Hot Air’ word or a technology? Cloud[6] computing is Internet "Cloud"-based use of computer technology. In concept, it is a paradigm shift whereby details are abstracted from the users who no longer need knowledge of, expertise in, or control over the technology infrastructure "in the Cloud" that supports them. Cloud computing describes a new supplement, consumption, and delivery model for IT services based on the Internet, and typically involves the provision of dynamically scalable and often virtualized resources as a service over the Internet. Cloud may be an option. I will do a deep dive on this topic in the next chapters.

5. IF GROWTH IS UNAVOIDABLE, USE TIERING TO LOWER YOUR COSTS.

Using tiering to accommodate growth and lower your costs sounds reasonable. First, you must classify your data types. You should do a classification based on the changes and importance to which they are subject. The relevance of understanding data in this context is in establishing a protection category for the data. Data Classification is an ongoing process. The class of a particular piece of data may change during its lifecycle where new types are introduced or formerly classified data changes state over time. I prefer to use the term active to describe frequency of access. It is a measure of the level of activity, not a measure of change. Data that is subject to change and able to be changed is referred to as changeable data. Data that is not likely to change or cannot be changed is fixed content data. Fixed content data is called an archive.

CREATE – PROTECT – ACCESS – MIGRATE – ARCHIVE – DISPOSE Many organizations mistakenly treat backups as archives. Backups are not designed to serve the needs of an archive. Backups are managed differently than active archives. Active archive data only needs to be replicated at ingestion into the archive. This process is easier and less costly than the time it takes to complete the process using ongoing backup methods. You could shift these data to another tier such as network-attached storage (NAS) or content-addressed storage (CAS). EMC Centera® or Celerra® simplifies archiving and lowers storage solutions costs. Nevertheless, this kind of data could be a candidate for Cloud service.

Page 14: Storage, a Cloud storm or not? - Education Services Home, · services – also known as "The Cloud." The phrase "The Cloud" is quickly gaining recognition but there are still serious

2010 EMC Proven Professional Knowledge Sharing 14

6. CLOUD SERVICES?

The trend term is Cloud and Storage Cloud. Today, everyone talks about Cloud. Cloud computing is a style of computing whose foundation is the delivery of services, software, and processing capacity using private or public networks. The focus of Cloud computing is the user experience, and the essence is to decouple the delivery of computing services from the underlying technology. Cloud is Hot!

I will not explain in this chapter what Cloud computing is and does or that there are two kinds of services such as public and local Clouds. I will go through some benefits and discuss these in other chapters.

Some benefits of Clouds:

Reduced Cost: You pay for only what you use. No sky-high energy consumption and costs for power and cooling data centers.

Highly Automated: No longer do IT personnel need to worry about knowledge and keeping software up to date.

Flexibility Cloud computing offers much more flexibility than past computing methods.

Growing on Demand Organizations can store more data than on private computer systems. Since you pay for only what you use, you do not need to be concerned about how much storage is left over. The strength here is, if you need less, you pay less.

Mobility Employees can access information wherever they are, rather than having to remain at their desks. However, this requires strong security.

Is Cloud computing right for you? It could be. But it depends. How are these related to the key storage requirements shown below and benefits above?

Availability

Capacity

Performance

Security

Scalability

Data Integrity

Manageability

Page 15: Storage, a Cloud storm or not? - Education Services Home, · services – also known as "The Cloud." The phrase "The Cloud" is quickly gaining recognition but there are still serious

2010 EMC Proven Professional Knowledge Sharing 15

2. Key requirements

Although everyone talks about Cloud computing, everyone seems to forget why we should or should not move our data to the Cloud. Uninterrupted operations of data centers are critical. It is necessary to have a reliable infrastructure that ensures data is accessible at all times, whether it is local or public.

Good storage management eliminates single points of failure, arranged backup and recovery considerations, and using local or remote replication. Another topic – security management –is responsible for preventing unauthorized activities or access by applications or users. Configure or design an optimal operational infrastructure through analyzing and identifying bottlenecks and recommend changes to improve performance.

These are just a few of many considerations that should help you understand the different choices you face. Making wise decisions when dealing with data growth in relation to storage management will make it easier to determine if Cloud computing is right for you!

2.1. Availability

Availability is the ability of an IT service or component to perform its required function at a stated instant or over a stated period of time. Organizations always strive to optimize the capability of the IT infrastructure, services, and supporting organization to deliver a cost- effective level of availability. There is only a small chance to be a victim of terrorism and natural disaster. Statistically, these threats account for less than 1% of IT unavailability. While it is important to plan contingencies against threats posed by disasters, organizations are most vulnerable to events that fall under the categories of planned and unplanned outages. To know where your focus should lie, these facts are statistically important. Where, geographically. is your building constructed? Is there a risk of being struck by a natural disaster, like water, earthquakes, or forest fires? Data center vendors consciously look at these possibilities when building data centers. This is something you do not have to care about when putting your data in a Cloud data center.

Page 16: Storage, a Cloud storm or not? - Education Services Home, · services – also known as "The Cloud." The phrase "The Cloud" is quickly gaining recognition but there are still serious

2010 EMC Proven Professional Knowledge Sharing 16

What about remote replication? Organizations use remote replication technology to create disaster recovery sites. These sites often replicate whole data centers. While replication technologies work very well for disaster recovery, they share one important characteristic: since remote replication replicate data faithfully from one place to another (production to failover site), any corrupted file will be replicated just as faithfully as the good and fault file. So, this makes them valuable to disaster recovery, but not so good for operational backup. This is something to keep in mind because this will not be solved in a Cloud. Something else to consider is RAID technology. RAID stands for Redundant Array of Inexpensive (or sometimes "Independent") Disks. RAID is a technology of combining several disks into one logical unit, two or more disks grouped together to appear as a single device to the host system. RAID technology was developed to address the fault-tolerance and performance limitations of conventional disk storage. It can offer fault tolerance and higher throughput than a single hard drive or group of independent disks. While arrays were once considered complex and relatively specialized storage solutions, today they are easy to use and essential for a broad spectrum of storage devices. I will discuss more about RAID in chapter 2.3. Assume that the array is 99.999% available; what does that really mean to you? Probably not a lot in practical terms. Does it mean that individual components are 99.999% available? Or does it mean that the array itself is available? It does not mean your whole IT department, including all the systems, has to be available should an outage occur. Probably just for a small area or specific data or systems. What I’m saying is, your core application will be more important than Microsoft Word or Excel unless these are your core applications. Of course, this depends on which data will stay local or move to a public Cloud. Table 1-1: Availability Percentage UPTIME (%) DOWNTIME (%) DOWNTIME DOWNTIME PER YEAR PER WEEK 98 2 7.3 days 3 hr 22 minutes 99 1 3.65 days 1 hr 41 minutes 99.8 0.2 17 hr 31 minutes 20 minutes 10 sec 99.9 0.1 8 hr 45 minutes 10 minutes 5 sec 99.99 0.01 52.5 minutes 1 minute 99.999 0.001 5.25 minutes 6 sec 99.9999 0.0001 31.5 sec 0.6 sec

2.2. Capacity Data center operations require adequate resources to store and process large amounts of data. When capacity requirements increase, the data center must be able to provide the additional capacity. Cloud is a good solution for this. Expanding the same sources also adds technical complexity; therefore, you need advanced storage. Local cheap disks will not do the trick.

Page 17: Storage, a Cloud storm or not? - Education Services Home, · services – also known as "The Cloud." The phrase "The Cloud" is quickly gaining recognition but there are still serious

2010 EMC Proven Professional Knowledge Sharing 17

Capacity management of storage devices has been given a high priority in many IT departments, frequently having to do with money. You do not want to waste budget dollars by over-provisioning storage capacity (typical storage is only 50%-60% utilized, at best)

In the past, capacity management of storage devices had been given a low priority. In strong economic times the budget to purchase new storage was readily available. Risking unavailability of end user data or applications due to a full volume was, and still is, unacceptable. In most cases, the answer has been to buy extra storage and over-provision capacity for a given application or user. However, even in weak economic times, storage hardware vendors have aggressively priced storage. This could make the difference. If nothing else matters, bringing your data to the Cloud would be a perfect solution for solving the eternal hunger for more data. Finally, you pay only for what you use.

If that were the case, I could explain in one page why you should buy extra storage or bring your data to the Cloud. Unfortunately, this is not the case. But, this is why storage management is so special.

A new term that has been added to this world of storage technology is “dark storage”. When adding storage to servers and not all of the claimed storage is presented as file systems and/or volumes, this wasted unused storage capacity is often referred to as "dark storage". It will typically go undetected by storage administrators because of the organizational separation between the storage and server administrators. Another trouble spot crops up when a server/virtual server or application is over-provisioned. Over-provisioning occurs when the application administrator allocates far more capacity than they will actually use.

I feel it’s important to inform you about log files. For some, it seems a sport to log everything there is without doing something with it. In my opinion, never log files when you don’t plan to do anything with it. It’s a waste of space since log files can grow enormously.

2.3. Performance

All core data center elements should be able to provide optimal performance. Often people say; take a whole enclosure and place everything in a RAID 5 instead of by application and look for the best performance. So what is best?

Such a simple question. Unfortunately, there is not a simple answer. It all depends on how the different applications access the data. Large sequential operations will tend to starve small block I/O on the same set of disks trying to operate at the same time. You need to determine minimum performance thresholds for each application on the array and make sure it can meet them under worst-case scenarios (rebuilding a failed disk and processing I/O). RAID is a protection scheme. Some view it as a way to get bigger LUNs. Disk I/O is based on the speed of the disk and is a measure of how many disks, the type of RAID, and its overhead. How you lay out RAID – on a shelf of disks or across multiple shelves – will depend on many things. How much influence do you have on this in the Cloud?

Page 18: Storage, a Cloud storm or not? - Education Services Home, · services – also known as "The Cloud." The phrase "The Cloud" is quickly gaining recognition but there are still serious

2010 EMC Proven Professional Knowledge Sharing 18

SCSI vs. ATA[7] There are two main forms of disks - SCSI and ATA. The main differences between the architecture of the SCSI and ATA disks are protocol and rotation speed. To start with the protocol, the SCSI protocol is highly efficient with multiple devices on the same bus, and it also supports command queuing. ATA devices have to wait on each other, making them slower when grouped together. The higher rotation speed means that when the head needs to move to a different location, it does not need to wait as long for the data to pass beneath it. So a SCSI disk can produce more IOPS than an ATA disk. The faster a disk rotates, the less time the head needs to wait before data passes beneath it and the sooner it can move to the next position, ergo the more I/Os it can handle per second. To give some idea of the numbers involved; a 15,000 RPM disk can handle about 180 random IOPS, a 5,400 RPM disk about 50. These are gross figures and the numbers of IOPS that are available to the hosts depend very much on the way they are configured together and on the overhead of the storage system. In an average SAN, the net IOPS from 15,000 RPM disks is 30 percent less than the gross IOPS.

RAID 5 with enough drives in a RAID group can satisfy quite a few performance needs[8]. It really depends on your applications and their performance requirements. It is usually better to have LUNs span as many drives as possible, however, you should not put together applications with different I/O profiles. Also remember, with RAID 5 your MTBF (Mean Time Between Failure) can be quite large, even more so with RAID 6. But, your MTBDF (Mean Time Before Drive Failure) can be very short (HDD MTBF / # Drives in use). It would be wise to have a supply of spares at all times. Today, vendors are coming out with 500 GB, 1 TB, or even 2 TB disks and with this, rebuilding times take a lot of time. It's also important to note that a RAID 6 has 2 parity stripes. This is great for recovering data if a drive is lost but the trade off is even longer re-build times. With drives this large in a 15-bay RAID enclosure running in a 7X24 environment, the rebuild times can take weeks, not days. From this perspective, it is not advisable to put all the disks in one large RAID 5 or RAID 6 group. Your choices then for RAID 5 vs. RAID 6, SATA vs. SAS, and 500 GB, 1 TB, or even 2 TB disks or some mix become clearer. These are also impacted by your overall data protection strategy. Backups, snaps, disaster recovery methods, etc., along with budget all determine what you really need to do. So here we are… money and performance needs. I would try to find a sweet spot and decide from there. If money is not the real problem, take a closer look at what your applications need. Are your applications generating sequential I/O's or just random I/O's? And what about read and write ratio? A RAID 5 configuration has a penalty of 4 or 8 etc. when writing data, where, in a RAID10 setup, this is only 2, but then again: RAID10 costs twice as much, and RAID 5 only costs 1 drive extra. I'd suggest talking to the financial division as well as talking to a performance guru. So what is the best advice for RAID groups? There is none! As simple as that. But keep in mind that there are good default choices for RAID groups like 3+1 or 4+1 for RAID 5 or RAID 6 with 6+2, 8+2, 10+2. Anyway, two 4+1 RAID 5's equal the same overhead as 8+2 RAID 6

Page 19: Storage, a Cloud storm or not? - Education Services Home, · services – also known as "The Cloud." The phrase "The Cloud" is quickly gaining recognition but there are still serious

2010 EMC Proven Professional Knowledge Sharing 19

but will have better performance. It would be nice for disk vendors to bring out low capacity drives again so you could design your capacity and performance a little more granularly.

This is quite a discussion. You need some storage specialists to deal with this one. Back to the benefits – Do you need these specialists or just leave it to others by putting your data in the Cloud? It’s true that for low-end data such as archival data or backup data, you don’t need to worry. Performance is not a big deal here. It will be something else when you put your production data – database indexes, search indexes or other high-end data – in the Cloud. The infrastructure must support performance requirements. Whether being accessed by one user or hundreds of users simultaneously, your applications should be able to provide optimal performance and service all the processing requests at high speed. Try to find disks that run at 15,000 RPM in your local store. In essence, the user experience is more valuable than measured value counts. Can this be guaranteed by a Cloud? And where do you start if it's not. I say with some certainty that the Cloud vendor will point their finger at you first.

2.4. Security We all panic when it comes to security. But stay calm! There are different approaches to security. Try to define security in risk, and risks in terms of threats and vulnerabilities. To manage risks, focus on these threats and vulnerabilities. As an organization, you can eliminate threats by taking countermeasures to reduce the vulnerabilities. There are some very obvious ways to secure your data center. Let's enumerate some and discus them.

- Site Location - Surveillance - Access Points - Infrastructure

I will summarize the above mentioned topics. I do not go deep into all options available to protect you from violence, disasters, and so on. Use your imagination and common sense! Site Location: The site location should be where the risk of natural disasters is very small. When you think of natural disasters, think of forest fires, lightning storms, tornadoes, hurricanes, earthquakes, and floods. ComputerSite Engineering, Inc.has compiled a Natural Disaster Risk Profiles for Data Centers[9] sheet. It is dated but still useful as a guide. While, statistically, these threats account for less than 1% of IT unavailability, it’s still important to take it into consideration. Surveillance: There should be CCTV cameras outside or inside the building monitoring parking lots and neighboring property. You could consider guards patrolling the perimeter of the property. Avoid outside windows. The Site Location must not have windows to the outside placed in computer rooms. Such windows could provide access to confidential information or systems.

Page 20: Storage, a Cloud storm or not? - Education Services Home, · services – also known as "The Cloud." The phrase "The Cloud" is quickly gaining recognition but there are still serious

2010 EMC Proven Professional Knowledge Sharing 20

Access Points: Loading docks and all doors on the outside of the building should have some automatic authentication method (such as a badge reader). Each entrance should have some sort of security or physical barriers. Use Surveillance CCTV cameras to ensure each person entering the facility is identified. Engineers or others require badges to enter the building. You should think of signs at the door(s) marking the room as restricted access and prohibiting food, drink, and smoking in the computer room. Access should be restricted to those who need to maintain the servers or infrastructure of the room Infrastructure: Each computer room must have redundant access to power, cooling, and networks. There should be at least a good air flow and cable management. Computer rooms should have air filtration and high ceilings to allow for heat dispersal. Make sure your environment in the room has the requested temperature and humidity given by the hardware vendors. Environmental sensors should log the temperature and humidity of the room and report it to the responsible agencies. There must be battery backup power onsite with sufficient duration to switch over to diesel power generation. If there is no diesel backup then there should be 24 hours of battery power. If there is a diesel generator on site, make sure there is 24 hours of fuel also on site. If this is not the case, sign a contract. Nearly all of the largest data centers are prepared for cases like the four examples given above. When you place your data in a Cloud, you presumably don’t have to worry about these security issues. We just discussed some physical security. But what about logical security such as policies, procedures, and proper integration of the data center core elements that will prevent unauthorized access to information? Can this be guaranteed locally or by a Cloud? And where should you start if it's not? It has to be secure at all times, that's for sure. The level of security depends also on the kind of data or information. In some countries, there are exceedingly high security requirements of the Government. You certainly don’t want important information on the street. Do you want this in the Cloud and can this be guaranteed by the Cloud? Of course there are more examples. Some require exceedingly high security by Government and others require high security by the company. So it depends which data or information you want in the Cloud together with the associated security. Don’t take this consideration lightly!

2.5. Scalability Data center operations should be able to allocate additional capabilities or storage on demand, without interrupting business. The storage solution should be able to grow with the business.

These days, using hypervisors such as VMware, Microsoft, or Citrix is the best way to achieve scalability and performance in your data center. The hypervisor plays a key part in delivering scalable data centers. Most x86 or x64 servers today operate at a mere 10-15% of

Page 21: Storage, a Cloud storm or not? - Education Services Home, · services – also known as "The Cloud." The phrase "The Cloud" is quickly gaining recognition but there are still serious

2010 EMC Proven Professional Knowledge Sharing 21

their total capacity. You can increase utilization to as much as 85% by running multiple operating systems on a single server. This provides a great advantage. VIRTUALIZATION TECHNOLOGY LETS CUSTOMERS REDEFINE THE WAY THEY LOOK AT THEIR IT[10]

The main objective of a hypervisor is to abstract the physical hardware to virtual machines (VM’s). The strength is that you can spend just enough resources such as CPU, memory, networking, and storage as necessary. You need more, or maybe less? Just scale up or down. Storage is a critical building block in delivering the flexibility, scalability, speed, and efficiency of today’s virtual data center. Let’s zoom in to VMware. VMware Unveils New vStorage technologies to deliver efficiency and manageability for the Virtual Datacenter operating system. vStorage is a set of infrastructure services such as:

Thin provisioning A mechanism that applies to large-scale centralized disk storage systems. Thin provisioning allows space to be easily allocated to servers, on a just-enough and just-in-time basis.

‘Linked Clones’ / ‘Gold Masters’ A Linked Clone is a copy of an existing virtual machine. The existing virtual machine is called the Parent or Master of the clone. When the cloning operation is complete, the clone is a separate virtual machine. VMware works with ‘Linked Clones’ as Citrix with ‘Gold Masters’. Hyper-V virtualization doesn’t offer an out-of-the-box option to clone an existing Virtual Machine (VM) into a new one. However, this doesn’t mean it is not possible.

Page 22: Storage, a Cloud storm or not? - Education Services Home, · services – also known as "The Cloud." The phrase "The Cloud" is quickly gaining recognition but there are still serious

2010 EMC Proven Professional Knowledge Sharing 22

Storage Virtualization The SNIA[11] Storage virtualization taxonomy provides a systematic classification of storage virtualization with three levels defining what, where, and how storage can be virtualized. You should read the SNIA Technical Tutorial Storage Virtualization for better understanding.

Page 23: Storage, a Cloud storm or not? - Education Services Home, · services – also known as "The Cloud." The phrase "The Cloud" is quickly gaining recognition but there are still serious

2010 EMC Proven Professional Knowledge Sharing 23

2.6. Data Integrity

What does data integrity really mean and why is it important for people to understand? Most IT managers or CIO’s and business owners would agree that a company's most valuable asset is its information. In fact, losing information can have catastrophic results. For instance, online transaction processing databases typically log thousands to millions of transactions a day. Likewise, a single corrupted block of data can render an entire file unreadable. Protecting data is crucial to a company's survival in today's competitive business environment. Storage administrators may feel that because they store their data on a redundant disk array and maintain a well-designed tape-backup regimen, their data is adequately protected. However, undetected data corruption can occur between backup periods – backing up corrupted data yields corrupted data when restored. Storage systems should protect against data corruption. These systems offer a comprehensive set of data integrity features that protect data before, during, and after it is stored on disk. This requires some technical complexity and therefore you need highly-advanced storage. The technique of cache is leading here. Cache is a very fast but not addressable memory used on storage systems for executing instructions. Of course, if you go from internal to external data storage you presume that your vendor meets all the requirements to prevent data corruption.

2.7. Manageability

Managing a complex data center involves many tasks. Besides the seven requirements you have to deal with, there are also monitoring, reporting, optimization, and other tasks. It is a misconception to think that storage has only to do with a bunch of disks that simply store data. A good reflection of manageability is given in the picture on the right. It is an ongoing process which continues forever. At least so long as your company does not go bankrupt. In large companies you see one person or department per task do the job. In small companies it is often one person charged with doing all or some of the tasks shown in the picture on the right. Manageability can be difficult and very complex or, the opposite, easy and simple. Whether it is inside or outside there is always something to manage!

Page 24: Storage, a Cloud storm or not? - Education Services Home, · services – also known as "The Cloud." The phrase "The Cloud" is quickly gaining recognition but there are still serious

2010 EMC Proven Professional Knowledge Sharing 24

3. What To Do Next?

What to do next? A good question at the end of this article. My goal was to help you better understand the choices you have to make when dealing with data growth in relation to storage management.

Still difficult to decide? No wonder; it remains complex, and often there is a lot of money involved, making choices blur. What is a good approach? I would certainly use the arguments in this article when it comes to decision-making. Also be aware that sometimes it is just playing roulette by betting on black or red. Nothing is certain or 100 percent predictable, but we can at least try to be as close as possible. Doing nothing is not an option. Data will continue to grow, that’s for sure.

You could ask yourself again; What are some of the challenges facing IT departments today regarding storage? One way to answer that question is to look at what IT has to deliver; mission-critical applications. Examine these applications in depth. We’ve talked at length about storage, data growth, and key requirements. In order to find a solution, it’s first important to have a better understanding of where the volumes of data are coming from and why they are increasing. The first approach and next step is to find out which applications are running inside your organization. Perform a data classification and find out which one is important. You probably ask yourself; what is important and what is not? Determine the business areas and determine for each business area the processes and identify which are essential to the operation of the business. It is most likely that the IT department cannot answer that question. The next step is to classify these applications in categories. A data classification based on values, perspectives, schemes, classes, and rules will help you define which applications are important or not.

Mission critical

Any application that is critical to the proper running of business and lead to significant revenue loss can be marked as mission critical.

Business critical Business critical applications increase productivity. These are the applications that support mission critical applications and also contribute to significant revenue loss.

Business important Business important applications are not critical and contribute to a limited revenue loss.

Productivity important These applications will only affect the productivity of some departments and contribute to a limited revenue loss.

Non-critical Non-critical applications have a slight impact on productivity and are often too personal to be taken into account in times of crisis. They contribute to a minimum revenue loss.

Once you complete placing your applications in a box, you can start looking at the technical requirements of each application. Does it need to be High Availability or does it need High

Page 25: Storage, a Cloud storm or not? - Education Services Home, · services – also known as "The Cloud." The phrase "The Cloud" is quickly gaining recognition but there are still serious

2010 EMC Proven Professional Knowledge Sharing 25

Performance? What about High Scalability or rapid restore? Yes, I’m talking about Tiering. Not all applications need to be on a highly redundant EMC Symmetrix DMX system. Some applications can be put on a NAS instead of a SAN. Maybe you don’t need redundancy or scalability. It should be clear that a mission-critical application needs more availability than a non-critical application. Here is where you will have to balance between solutions. Understandably, this may not always be easy within the practicalities of day-to-day business or simply costs.

Page 26: Storage, a Cloud storm or not? - Education Services Home, · services – also known as "The Cloud." The phrase "The Cloud" is quickly gaining recognition but there are still serious

2010 EMC Proven Professional Knowledge Sharing 26

4. Conclusion

Cloud or not? Is storage a Cloud storm or not? Well, it depends. In the 'early' years and still now, all that basically matters is capacity. From that point of view, I think Cloud initiatives can make the difference. Because in a Cloud you pay per-net to capacity. But the growth can also make your environment unmanageably large. Costs will rise to a point where you wonder whether it is still profitable. Maybe you want to go back and host it yourself again. How will that go? No one seems to be talking about that. OK, fair enough, this was not the approach at the start, but still something to consider. A big plus is that you do not need to concern yourself with cooling the equipment. You can also save space per square meter in the data center and spend less on training. Tight IT budgets can be a powerful driver toward considering Cloud computing, but you still have to deal with performance. In just the past year and a half, I have seen a shift from capacity to performance. Applications become heavier than in the past. It is true that hardware and servers are faster as well but there is also a growth in complexity and dependency regarding these applications. Many businesses will realize that having well-defined SLA's will be a key point when buying Cloud services. Businesses will understand the consequences of their budget cuts when they no longer can go next door to yell at the IT guys until they fix the problem and profits start to disappear during an outage with no person to point fingers at. I haven't heard of any real Cloud service yet that hasn't had major issues and they won't magically disappear in a Cloud. They will just have larger consequences. What is suitable for a public Cloud? Let’s go even one step further. Why buy? Just use what there is for free when possible. Today there are a lot of ‘free apps’ worth your consideration. Applications such as Google Docs (http://docs.google.com), Google Apps (http://www.google.com/a), and Zoho Office (http://www.zoho.com) are beginning to change the way we think about software and files. Web services like Flickr (http://www.flickr.com) and YouTube (http://www.youtube.com), as well as other browser-based applications, comprise a set of increasingly powerful applications that run from, and store data on, remote servers instead of local servers. Who saw it coming that Microsoft is ready to lift the next version of its Office suite online? The battle has only just begun! The next couple of years will be marked and dominated by Cloud storage. Companies like VMware, IBM, EMC, HP, Symantec, HDS, and NetApp are working on it. There are technologies such as virtual disks', 'thin provisioning', 'reservation-less snapshots',' automated quality of service ', support for Fiber Channel and iSCSI on the same platform, and intelligent power management. Almost every major storage vendor will have these technologies in-house and service them to customers or IT providers. New technologies used will be 4 TB (terabyte) SATA hard drives, self-storage tiering with solid state drives (SSD), and encryption of data on separate storage devices. An offsite backup can be very interesting.

Page 27: Storage, a Cloud storm or not? - Education Services Home, · services – also known as "The Cloud." The phrase "The Cloud" is quickly gaining recognition but there are still serious

2010 EMC Proven Professional Knowledge Sharing 27

References

[1] http://www.emc.com/digital_universe [2] http://www.emc.com/collateral/analyst-reports/diverse-exploding-digital-universe.pdf [3] Information Storage and Management Book by EMC Education Services [4] By: Philip Howard, Research Director - Data Management, Bloor Research Published: 31st July 2009, Copyright Bloor Research © 2009 [5] http://en.wikipedia.org [6] http://en.wikipedia.org [7] VDI & Storage: Deep Impact by Herco van Brug [8] By Roy Mikes, A buch of disks in RAID 5. [9] ComputerSite Engineering Inc., Natural Disaster Risk Profiles For Data Centers [10] Virtualization Guide, VMware vSphere 4; The best platform for building Cloud

Infrastructures [11] By: Frank Bunn, Nik Simpson, Robert Peglar and Gene Nagle, SNIA (Storage Networking

Industry Association), SNIA Technical Tutorial Storage Virtualization