Disaster Recovery in the Cloud - SAN to SAN Replication

Disaster Recovery in the Cloud - SAN to SAN Replication

June 14, 2011 2:00 pm

(Save to cal)

Online

Mike Flaherty discusses SAN platforms, including the EqualLogic iSCSI SAN, where to place the SAN, and why some locations are better than others, RTO, and RPO concepts, the SAN technology itself, and explored various replications methods.

Tuesday 6.14.11 @ 2pm

View Slides

 

 

 


“Good afternoon and welcome to the SAN-to-SAN Replication and IT Disaster Recovery in the Cloud webinar. In this webinar we will explore SAN-to-SAN replication and explore the disaster recovery landscape in the cloud. My name is Mike Flaherty, a Systems Engineer with Online Tech and I will be hosting this webinar. In addition to today’s webinar, our next event will be held on June 21st, in downtown Detroit at TechTown and Next Energy Auditorium. More information is available on our website and I will have a slide at the end of the presentation describing it. Let’s get started.

So here’s what I would like to cover today. We will discuss our chosen SAN solution platform, the EqualLogic iSCSI SAN and discuss where to place the SAN and why some locations are better than others.  After that we will discuss and touch on some disasters, RTO and RPO concepts. And finally dive into some of the SAN technology itself and some of its applications.

The EqualLogic SAN was chosen by Online Tech first for our internal systems and servers. Based on an evaluation of products in the marketplace, EqualLogic was chosen for its ease of use, scalability and performance. Over the past several years, Online Tech utilized EqualLogic SAN in production, and today does SAN-to-SAN replication between our Mid-Michigan data center in Flint and our Ann Arbor data center. After this solid performance internally, Online Tech turned this into a managed service and we use EqualLogic SAN systems for some of our customers, powering some of their managed cloud solutions. The Dell EqualLogic platform continues to win awards for performance, scalability, and cost effectiveness for storage.

When you get this SAN, you can place this SAN virtually anywhere, but keep in mind that if any critical system is down (for instance electrical power, network or air conditioning) your end users can’t access the SAN data. So placing an EqualLogic SAN in a data center like Online Tech, offers you the ability to offer for higher application availability. And that’s why you are here today, for application availability and access to your data at all times. Online Tech’s data centers are located across two different power grids, with redundant electrical power systems, battery banks, generators, redundant networks, and expert staff ready to assist 24/7. If your data is sensitive in nature, or “your” customers demand that your information be stored in a data center compliant with SAS 70 or SSAE 16, auditing standards, Online Tech is ready and willing to assist. From e-medical records to credit card data to sensitive personal and corporate data, your auditors will rest assured that the data is safe and secure.

Now, let’s look at keeping your data safe in a complex world as your data continues to grow in size and importance.Years ago, local tape backup was the norm and it is still used today to some extent. So to the lower left of the slide you will see local tape backup, but as its demand from the business grows and end users for 24/7 demand to their data access grows, something needs to change. Something needs to make the data more available. So here in what I call the feasible region, or the yellow section in the middle of the slide we start off moving up that continuum with offsite tapeless backup. This is the first major step in getting serious in backing up your data or making your data accessible. Online Tech uses the same offsite backup systems you see here and software to backup our company data. We also use this for our customers and we regularly test these methods to conduct full data restore exercises. And ensure that the data protection method we are using actually works. And they do.

As we prepare to discuss SAN-to-SAN replication in the next few slides, keep in mind that to allow for a warm site, hot site, or even an active-active disaster recovery or production site, we slide up that feasible region towards active-active. Depending on the final configuration, the costs rise accordingly. So where is a safe place to store your data? What is a good region to store your data in?

Michigan offers a sweet spot when it comes to location security. For example, this chart shows seismic hazard. You can see Online Tech’s data centers in Michigan sit in the safe zone to ensure your data is safe and can be successfully recovered in the event of a disaster, either major or minor. Let’s look at the next slide to begin to explore the concept of SAN-to-SAN replication. In these next two slides I describe SAN-to-SAN replication before we get into the details. First, the inheritant ability to replicate is included at no additional cost with the EqualLogic SAN’s. Unlike other SAN solutions on the market, there are no hidden fees and there are no surprises. EqualLogic auto-replication allows you to replicate volumes to remote sites over any distance by leveraging existing IP network infrastructure. EqualLogic SAN’s are also the foundation to a solid disaster recovery plan to keep the data safe and the data replicated. Storage is really everything today. Storage, the network, the cloud are all tied together. We feel that the foundation of building blocks for your IT strategy are residing in the SAN today

How do you get the data in the SAN? And how do you replicate the data to another SAN? If you have a large amount of data, one option is to use a large manual utility. This allows you to load the data from one SAN onto an external hard drive and then load that onto the secondary SAN at a remote site, and then turn on replication. You can also place the SAN’s side-by-side and locally replicate the SANs on the same network segment. From that point, you have the choice to migrate the secondary SAN to an offsite data center or location.

Once the data is on the SAN and the SAN’s are synched up, asynchronous replication occurs on the SAN, capturing the incremental data change over time and keeping the data’s in sync. Multiple San replication methods exist to transfer data, which we will discuss on the next slide. In the event of an outage at your primary SAN site, your secondary SAN site becomes the primary. Once that outage is over, say it’s a power outage and you are down for eight hours, you need to fail back to the primary SAN and re-sync all of the data changes that happened in that and then make that SAN your production SAN again.

While any TCP/IP networks work fine for replication, depending on the amount of data change, a higher speed private line may be needed to replicate data in an acceptable time frame. That amount of change data and the speed of the network ultimately must be balanced to ensure that an acceptable amount of time exists for the replication to occur. In practice, the actual bandwidth and latency characteristics of the network connection between replication groups must be able to support the amount of data that needs to be replicated in the time window in which the replication needs to occur.

Here is the first of four scenarios that show us how two SAN’s can replicate.

With the exception of failback events, EqualLogic replication is always a one-way process where data moves from the primary group on the left to the secondary group on the right. So both of the SAN’s pictured in the slides can be located in: 1) The same building replicating side-by-side or 2) one SAN can be in the customer building or data center and the second SAN can be offsite in another data center. The second option could also place both SANs in two different data centers, ideally connected by high-speed private fiber. No doubt two SAN’s are better than one. Implementing the replication process that works for your business and given strategy, is the place to start. Ideally, the second SAN will be placed over 50 miles away to accomplish a solid disaster recovery foundation.

A second replication method is all reciprocal replication.

In contrast to basic replication, reciprocal replication keeps a primary or production data volume on each SAN. For example, think of a business using two data centers, or having two offices. Users in location #1 use their primary data and replicate it to location #2. And users in location #2 replicate their local SAN data, and replicate it back to location #1. All data is safe and secure. Now with the exception of failback events, EqualLogic replication is always a one-way process where data from the primary group volume to the secondary group replica set. In this case, replication allows for reciprocal partners, which means you can use two operational sites as recovery sites for each other. It’s a pretty clever trick.

But what if you have more than two sites?

In this slide we have three volumes A, B and C, all residing on the same SAN in the same location, Fort Wayne. These volumes replicate over the network to three different locations- Chicago, Cleveland and Ann Arbor. This is a “one to many” replication scheme. Similar to this scheme, you can also replicate it the other way. In this solution we have three remote sites again (Chicago, Cleveland and Ann Arbor)- all replicating their data volumes back to headquarters in Fort Wayne. This is a “many to one” replication scheme. It is pretty common to see this replication with franchises and companies with multiple remote offices.

We have a couple of customers at Online Tech that do this today. They have several offices around the country. They replicate to a single point and then all the data is centralized and backed up as a solid disaster recovery plan.

Question: How does this replication impact the performance of my servers, or VM?

Auto-Replication is a background process on the SAN and is designed to have very little impact on the performance of the servers connected to the SAN. On the other hand, that means that server I/O can affect the performance of replication. If there is a heavier workload from the attached host, the SAN arrays may devote resources to that replication and that could cause replication times to be longer than when there is a lighter load on the attached host. That is why you typically see a replication window similar to a backup window for the SAN type solutions.

Server I/O is another factor that you need to take into account when you are considering Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO). Using the built in SAN management software, called SANHQ, you can actually monitor the I/O load and replication times, and ensure there is enough “head-room” built into the SAN to accommodate a difference in workload from day to day or during different parts of the day if multiple replicas are created.  

So, RTO and RPO are important considerations when looking at SAN-to-SAN replication. Recovery Time Objective (RTO) is how long a business can get by without a particular system or application in the event of a disaster. An RTO of 24 hours implies that after a disaster the system or data needs to be online and available again within 24 hours.

The term Recovery Point Objective (RPO) is generally used to describe the acceptable loss or time gap of data that is ok to lose in the event of a disaster. A business must decide if it is acceptable to lose any data in the event of a disaster, and if so, how much data can be lost without a significant impact to the business. For some applications or systems, this may be as long as 24 hours, while others this may be as little as 15 minutes, or even zero.

To give you an example of how long it takes to replicate data, I will give an example of where we replicate say 100gb of data. So you you’ve got a SAN at your office and SAN at a remote site, let’s say a data center. So a 100GB, if you had a T1 and you wanted to replicate that data it would take about 6.5 days to replicate that data from one site to another. So for a 100GB of data a T1 is  going to cut it. If you move up to an OC3 or a 155Mb per second, that would take you about 95 minutes to replicate data. And using that same 100GB of data with the 1GB per second connection, and we have several customers that do have 1GB of per second connection at Online Tech, it will take 16 minutes to replicate 100GB of data. If you are lucky enough to have 10GB per second connections, it will take you about 8 minutes to replicate that data. So you can see with these times a T1 is probably not going to cut it, you are going to need to build in some network costs to replicate that data offsite in an acceptable timeframe.

SAN-to-SAN replication can lower the cost of disaster recovery through storage automation. EqualLogic Sans are designed to for 99.999% of availability and are very enterprise and data center proven.  They are very simple to set up, simple to recover and utilize your existing IP network resources.The Dell EqualLogic SAN solution is a complete solution with all the tools you need for a very robust SAN infrastructure. EqualLogic SAN array includes all the software that you need to help you get the full value out of your investment. Provide consistent performance over the life of the SAN and provide you with the flexibility you need to really reconfigure your SAN non-disruptively when the businesses dictate.

The EqualLogic SAN you put in place today can be integrated with future SAN array purchases and take advantage of new software features protecting your investment. Dell EqualLogic firmware automatically manages and optimizes system resources, including disk, controller, network connections, allowing for your data to run in an optimized fashion. With EqualLogic embedded software, you control the infrastructure and we make it work. Arrays can be removed from service with very little impact to your application availability. The main point in all of this is: using EqualLogic SAN’s, you don’t need to choose between data protection and performance. All the features required for developing a sound data protection strategy are included with EqualLogic Firmware software. You have space-efficient snapshots, clones, replication, instant restore from snap and fast failback.

This base functionality can be extended with EqualLogic Host Software to provide application consistent data protection in Microsoft and Mare environments.In summary, if your business needs 1 terabyte of SAN storage, or even 1 petabyte, EqualLogic SAN’s solutions can deliver the next generation solution you need as your data continues to grow. Coupled with a robust data center environment, your cloud and SAN solution can replicate data offsite and set the stage for a solid disaster recovery solution.

Back to Top

 



Webinars    |    Online


Get started now. Exceptional service awaits.