Wednesday, 5 March 2014

Cache on top of cache on top of cache

Hi everyone,

For the second post, I'll dive into caching in its many forms, and explore how this absolutely critical piece of the puzzle leads into a storage environment that can keep up with the workloads demanded of it.

I'll simplify things down a little to make it make sense, as there are many, many factors that can impact some of the examples given, however, for the purposes of illustration, don't overcomplicate things.

What is it?


Caching is a critical part of any storage infrastructure, allowing the traditionally cumbersome spinning media to keep up with the performance demands of any workload. Once something like server virtualization comes along, this need is hugely enhanced by the increasingly random patterns in which the data needs to be read.

A cache is typically a small amount of very fast storage attached to a large amount of relatively slow storage. At the level of a single disk, this allows chunks of data to be read from the much slower mechanical or flash-based disk, into that small amount of memory, where the request from a server can be satisfied by picking out the bits it needs now, and likely the next bit it needs in a moment.

All of this happens incredibly quickly, at the level of milliseconds or microseconds, however, the difference made by dropping the time it takes for a data request to be satisfied by just one millisecond makes a large difference.

For example, a storage device that can respond in 12 milliseconds to each request, is going to potentially take up to 5 times as long to answer a series of requests as something that can respond in 3 milliseconds, which in turn might take 3 times longer again to answer the same series as something that can respond to each request in 1 millisecond. (This is highly simplified)

Once you scale the example up to an environment where there may be hundreds of disks, and thousands of users, performing hundreds of thousands of requests of these disks every second, the benefits of reduced latency grow exponentially as more "work" can be completed in any given time.

Cache exists within practically all types of storage and storage controllers. Each SAS or SATA disk comes with a small amount of cache, RAID controllers also have an amount of memory integrated in them to provide an additional layer of performance and intelligence, and some controllers are even able to utilise a small amount of solid state storage to give a comparatively large area of caching

When it comes to caching, adding an SSD or flash unit directly to a RAID card, or installing the technology in the host server itself can provide some benefit, and looking at faster technologies such as using DRAM can further enhance the cache built into the storage devices themselves.

Practical application


When dealing with environments that rely on shared storage architecture, as most highly-available, virtualized environments do, this presents a management headache in which some servers may be under-utilising the high-performance resources in some areas, and being bottlenecked in others.

The solution to many challenges is to be able to add fast disk technologies to your existing shared storage infrastructure, and be able to utilise DRAM (lower latency and higher speed than even the fastest flash devices on the market) in the shared storage to further enhance the tier-1 flash devices.

With ~300 or so flash or SSD vendors on the market right now, the number of options are staggering, and to be anchored to a single vendor or disk technology could well be a massive drawback given the rate of innovation around SSD and Flash technologies.

My ideal environment would have heterogeneous storage with multiple tiers of storage; best of breed flash as the top tiers, SAS for the mid-range data, and some near-line SAS/SATA for bulk capacity, all complemented by DRAM caching to enhance the latency response.

It would have to be able to use the most cost-effective technologies for each tier, and give the flexibility to utilise these technologies regardless of vendor, and be able to ensure that minor tasks like maintenance, upgrades, and decommissioning equipment at end-of-life, did not result in the entire staff sitting around twiddling their thumbs due to an outage.

Is this all possible using technologies accessible easily today? Yes, many times over, and the answer is through using a software layer such as DataCore to abstract the features and functions away from the physical devices, giving the ability to have a storage strategy, rather than settling for a temporary solution every 3-5 years.


Sunday, 17 November 2013

Cost per TeraByte and the Spindle Count Trap

Just as a little reminder, I maintain this blog to jot down ideas and thoughts as I go. Any views expressed in this are mine alone, and not those of any past or current employer etc. I will endeavour to correct omissions or mistakes as soon as I learn a better way, so please feel free to contact me with constructive feedback!

Designing and scoping storage solutions:

For any growing business that is consuming more and more storage, many metrics get thrown around as the "rules" by which they should purchase storage.

Many administrators that primarily have dealt with smaller environments, primarily focused on file and print storage, or archiving, will treat the cost per TB of storage as the primary metric on which they base their decision.

Whilst this is fine if the capacity increases are simply to hold more static data from backups or to deal with retaining larger files for whatever regulatory period they are subject to, once an organisation starts to look down the road of Virtual Desktop Infrastructure (VDI), or databases (Exchange the or SQL etc.), or any application with higher numbers of users or high transactional workloads, the cost per TB becomes secondary, and can more often be a distraction from creating an environment that will achieve what is expected of it.

VDI is a great example of cost per TB being a secondary factor, as it is very easy to understand once explained.

When deploying virtual desktops, you are creating an environment which users can connect to from whatever location is allowed, where all, or most, of the processing is handled by the VDI server, rather than depending on the specifications of the users machine. This allows even users with the most basic hardware to access a portion of an extremely powerful central server, delivering a consistent experience across devices.

These powerful central servers take over the role of being "the computer", performing the tasks requested on the central server then displaying the output on the device the end-user is connecting with. Most, if not all, of these tasks rely on storage, and when you are looking at 10's to 1000's of users all rely on the same storage

Flawed logic:

To use only the cost per TB model, if each user needs ~30GB or so, logic might dictate that you could easily accommodate a thousand users on a single tray of RAID5 or RAID6 protected storage, especially with 4+ TB disks becoming more common. So for the sake of this example, let's say that we are working with 1000 users across 12x ~4TB spindles in RAID6.

Now, go back to the start of the example. If you handed each of the 1000 users their own workstation, with a single spinning disk, would they be able to be productive with that level of storage performance? What would happen if that disk was then reduced to 1/10th of its performance, would they still be productive? How about 1/100th? This is the risk you are going to be exposed to when you are trying to run 1000 users across the resources that might do for 10. If we wanted to provide each user with the equivalent performance of one desktop disk drive, we would need 1000 or so disks, plus those needed for parity. In practice, many environments will give each user 1/2 the performance of a dedicated desktop disk in anticipation of the environment not being fully utilised at any one time.

The number of spinning disks, or spindles, per user becomes vital as the above example illustrates. Having tonnes and tonnes of space is of little value if the space is unusable for the purpose it has been deployed.

Key factors:

When looking at VDI or high transaction workloads in virtualised, there are some key factors to consider. The number of IOPS per spindle, the number spindles per user (or per workload), is vital to how the environment performs, the throughput, IOPS, and latency it is capable of delivering.

RAID levels:

How the disks within the trays are configured is a significant factor in performance. Whilst RAID5 or RAID6 are quite effective in delivering good space efficiency and read performance for bulk storage of items that are put there once then accessed many times, they typically suffer significant disadvantages in write-heavy environments such as development or record generation due to their poor relative performance on write operations.

In a simple environment where you had 12 disks in a single RAID6 array, you would have 10 disks worth of capacity that should provide great read speeds, acceptable levels of redundancy, and good space efficiency (low RAID overhead).

In a simple environment where you had 12 disks in a single RAID10 array, you would have 6 disks worth of available capacity that should provide very good read, and write, speeds, plenty of redundancy, however, less space efficiency (50% of the capacity used for redundancy).

Applied to virtualization

The two previous examples are common in both un-virtualized and virtualized environments alike. The architecture of putting a large number of spindles in a single array was very common before virtualization, and as it provides acceptable performance. remains common now.

When virtualization is introduced, the demands on the disk change significantly. Instead of a single server, application or user making demands of the storage, there are now many (from a couple to thousands) of users or applications requesting that the disks deliver or receive data at once.

This leads to a great increase in contention on the resources, and a shift in the way that efficient storage is laid out. Now, rather than one very fast block of disks with a single lane of access (IO queue), it can be much more effective to have many blocks of disk with many lanes of access. The end effect of this is that the queues for any request to wait upon are much shorter, and can be distributed evenly amongst all of the storage resources.

Combining all of this information, it becomes sensible to first analyse the way data is used in the environment. Are there a large number of users or applications dependant on the storage? What does each application require to perform acceptably? What ratio of read to write traffic will there be? What RAID level suits the IO characteristics of the environment? What fabric can deliver the speed and latency that I need?

Once you have this information, you can then architect the basics of your solution. Once you have determined what is needed, then you can approach your suppliers and ask what prices they will offer for the storage that suits your needs.

In practice:

In practice, I have seen environments crippled, or taken completely offline by a lack of awareness of these principles. A storage array is a tool, and if not used correctly, cannot deliver what anyone might have promised. The same array configured differently can be the difference between your solution being effective, or worthless.

Many arrays, and now even Windows itself, allow users to pool storage. Whilst various arrays or technologies might operate differently, DataCore, the product I've spent the most time with thus far, allows a user to pool many RAID LUNs from virtually any device, and to spread the data amongst them to provide more IO queues and significantly reduce latency by a factor of 3x-5x when combined with the built-in DRAM caching.

For read-heavy environments with many concurrent demands, multiple, smaller, RAID5 or RAID6 configurations (many smaller arrays versus one large array) can deliver better performance through additional IO queues, with a fairly low RAID overhead, giving a balance between performance and the cost per TB.

For write-heavy environments, many RAID1 or RAID10 packs pooled can provide immediate and significant improvements to the usability of an environment, even before considering changing the underlying disk technology (from SATA to SAS or SSD for example). This comes down to being able to use the tools provided in the most effective way for the situation.

Final thoughts:

Everyone has a budget to work within. This is a fact of doing business. When tasked with delivering a solution, always be clear on exactly what is expected now, how the environment will be expected to perform, how it should be expected to scale.

If an environment is optimised toward cost per TB, expect it to perform well at storing large quantities of data, do not expect that it could do what an environment optimised for low latency or high IOPS can do.

Be rational when building environments, as it's likely that a little research now could save much more than a few dollars in unnecessary equipment purchases and labour costs later on.

This is a massive topic, and this article just starts to scratch at the surface. I'm planning on running some tests as proof points in a lab environment once I have enough disks available to illustrate and clarify.

Tuesday, 19 March 2013

A primer on availability in virtualized storage environments


Why is storage so important to any infrastructure, and why should it be highly available and virtualized?

Everyone has a way of managing data, and each of those ways are related, but often very different in terms of recovery times, the effects of maintenance, the effects of failures, and how the environment will stand the test of time with growing workloads and data quantities.

Availability

In servers

In a business that relies on its data for operations, payroll, and in many cases their offerings, availability is king.

A great example would be if the server(s) goes down, a business can lose tens of thousands of dollars  an hour as employees sit there twiddling their thumbs, unable to access their calendars, contact details of clients, or in some cases even the machine, and this issue is only escalating.

With VOIP, virtual desktops, Exchange, SQL, Dynamics, all running on virtualized servers and performing the majority of the day-to-day processing, clustering for high availability has become the standard. Citrix, Microsoft and VMware all supply a hypervisor capable of keeping a business running, even if one of the servers completely fails.

The beauty of these systems is that most businesses can now use fairly standard hardware that fits their choices of budget, allowing them to grow based on their needs, and budget, at the time.

However...

It all relies on storage

Consolidating all of these resources has created a unique set of storage challenges. As these clustered servers all rely on shared storage to be able to quickly bring a resource online on a completely separate physical machine, the storage simultaneously became the most useful tool in the infrastructure, as well as boat anchor holding the infrastructure down.

Specialised, highly redundant devices that could tolerate a disk or two, maybe a power supply, or even a controller failing, quickly became the "must have". SAN technology was fast, Fibre Channel could deliver high speed volumes over great distances, but they took a bit of work. Others went down the NAS route, providing massive shared directories that could be looked into and accesssed using existing copper infrastructure.

Both of these approaches suffer the same glaring issues, a single point of failure, a lack of availability during maintenance or failures and lastly, not to mention that for all the virtualization and benefits of hardware independence, they are still highly proprietary devices.


To get around some of this, the hardware companies brought out highly specialised paired or clustered storage devices that could maintain exact copies of each other's internal disk arrays, and could be placed in separate locations so that a site outage would not take the entire infrastructure off-line. These tended to be extremely expensive, and would not play well with existing equipment, often requiring a full rip-and-replace to happen, even if their existing equipment was purchased from that same vendor. In many cases, not much has changed up to now.

The weakness in this is that the firmware is physically bound to the hardware, limiting the consumer to a "like it or lump it" choice of equipment at an inflexible price point. On top of this, there comes the logical issue that although most have recognised the benefits of virtualization for servers, desktops and app's, there are still many who have not made the connection that storage is the final stage of this, and that all of the benefits of virtualization can apply to storage as well.

 

Virtualizing Storage

The arrays in the broadest sense, are just commodity chassis with clever firmware embedded. This is where the cleverness of a decade and a half software development comes in. If you could take the "smarts" of these arrays out of the box, and couple it with whatever you wanted, you could build the exact environment you need, right?


Vendors like IBM, HP, EMC, Dell, Hitatchi and others have taken some steps toward this by releasing physical appliances that can utilise various underlying storage, as long as you purchase their hardware with embedded feature-set. This half-step toward allowing users the right to purchasing power still unfortunately locks an environment down in regards to how elaborate the fabric is, how many ports it can scale across, and overall appears to be a compromise.

Why not separate it to make it more accessible?

Some very smart people have done this, and there are software packages that run on top of Windows Server such as SANsymphony-V from DataCore (allowing you to use pretty much any industry-standard hardware to get started), along with various customised Linux platforms that offer more basic functionality (with certain restrictions on hardware based around what drivers you can get, often iSCSI only).

These software packages allow their users to pick and choose hardware based purely on capacity and performance, comparing "apples to apples" across hardware instead of trying to sift through the feature-sets of different arrays with slightly different hardware that may wind up being superseded in only a matter of months or years. Of course, a business would pick appropriate quality hardware for the capacity use with the software. There is no substitute for building on a solid foundation.

By taking the software (firmware) out of the physical hardware, and allowing all of the functionality over any device, someone who may only have access to basic equipment, now potentially has access to the same, or better, features formerly only available to the storage in large data-centres (high availability, auto-tiering, snapshots etc.). They can add technologies like SSD into their existing SATA / SAS mix without having to start over and migrate everything, and the benefits pile up year after year as new technologies from different vendors can be added easily, the useful life of equipment can be extended, and there can be full control over whether iSCSI or Fibre Channel or a mix of both is utilised.

So what does this mean for me?

For those who are running a lab at home, as many technologists do, can potentially have a more available and robust storage architecture than they ever imagined, all through parts scrounged off e-bay or bought from the local PC store. You can start with a couple of white-boxes and a few NICs, and piece by piece take it up to a full-blown Fibre Channel switched SAN without ever having the storage totally off-line to the hypervisor.


For businesses looking toward the era of "software defined storage", it means that the solution to the most common problems faced by anyone administrating storage (performance, flexibility, availability, manageability, scalability) is already here, you just have to look.


 
Find me on Twitter, Google+, and popping up where I'm needed at the time. Feel free to get in touch!