Implement catastrophe restoration with Amazon Redshift


Amazon Redshift is a completely managed, petabyte-scale knowledge warehouse service within the cloud. You can begin with only a few hundred gigabytes of information and scale to a petabyte or extra. This lets you use your knowledge to amass new insights for what you are promoting and clients.

The target of a catastrophe restoration plan is to cut back disruption by enabling fast restoration within the occasion of a catastrophe that results in system failure. Catastrophe restoration plans additionally permit organizations to verify they meet all compliance necessities for regulatory functions, offering a transparent roadmap to restoration.

This put up outlines proactive steps you possibly can take to mitigate the dangers related to surprising disruptions and ensure your group is best ready to reply and get better Amazon Redshift within the occasion of a catastrophe. With built-in options akin to automated snapshots and cross-Area replication, you possibly can improve your catastrophe resilience with Amazon Redshift.

Catastrophe restoration planning

Any sort of catastrophe restoration planning has two key parts:

  • Restoration Level Goal (RPO) – RPO is the utmost acceptable period of time because the final knowledge restoration level. This determines what is taken into account an appropriate lack of knowledge between the final restoration level and the interruption of service.
  • Restoration Time Goal (RTO) – RTO is the utmost acceptable delay between the interruption of service and restoration of service. This determines what is taken into account an appropriate time window when service is unavailable.

To develop your catastrophe restoration plan, you need to full the next duties:

  • Outline your restoration goals for downtime and knowledge loss (RTO and RPO) for knowledge and metadata. Be sure what you are promoting stakeholders are engaged in deciding applicable targets.
  • Determine restoration methods to fulfill the restoration goals.
  • Outline a fallback plan to return manufacturing to the unique setup.
  • Check out the catastrophe restoration plan by simulating a failover occasion in a non-production setting.
  • Develop a communication plan to inform stakeholders of downtime and its influence to the enterprise.
  • Develop a communication plan for progress updates, and restoration and availability.
  • Doc your complete catastrophe restoration course of.

Catastrophe restoration methods

Amazon Redshift is a cloud-based knowledge warehouse that helps many restoration capabilities out of the field to handle unexpected outages and reduce downtime.

Amazon Redshift RA3 occasion sorts and Redshift serverless retailer their knowledge in Redshift Managed Storage (RMS), which is backed by Amazon Easy Storage Service (Amazon S3), which is very obtainable and sturdy by default.

Within the following sections, we talk about the varied failure modes and related restoration methods.

Utilizing backups

Backing up knowledge is a crucial a part of knowledge administration. Backups shield towards human error, {hardware} failure, virus assaults, energy outages, and pure disasters.

Amazon Redshift helps two sorts of snapshots: computerized and guide, which can be utilized to get better knowledge. Snapshots are point-in-time backups of the Redshift knowledge warehouse. Amazon Redshift shops these snapshots internally with RMS through the use of an encrypted Safe Sockets Layer (SSL) connection.

Redshift provisioned clusters provide automated snapshots which are taken robotically with a default retention of 1 day, which may be prolonged for as much as 35 days. These snapshots are taken each 5 GB knowledge change per node or each 8 hours, and the minimal time interval between two snapshots is quarter-hour. The information change have to be better than the whole knowledge ingested by the cluster (5 GB occasions the variety of nodes). You may as well set a customized snapshot schedule with frequencies between 1–24 hours. You need to use the AWS Administration Console or ModifyCluster API to handle the time period your automated backups are retained by modifying the RetentionPeriod parameter. If you wish to flip off automated backups altogether, you possibly can arrange the retention interval to 0 (not advisable). For extra particulars, discuss with Automated snapshots.

Amazon Redshift Serverless robotically creates restoration factors roughly each half-hour. These restoration factors have a default retention of 24 hours, after which they get robotically deleted. You do have the choice to transform a restoration level right into a snapshot if you wish to retain it longer than 24 hours.

Each Amazon Redshift provisioned and serverless clusters provide guide snapshots that may be taken on-demand and be retained indefinitely. Guide snapshots assist you to retain your snapshots longer than automated snapshots to fulfill your compliance wants. Guide snapshots accrue storage prices, so it’s essential that you just delete them if you now not want them. For extra particulars, discuss with Guide snapshots.

Amazon Redshift integrates with AWS Backup that will help you centralize and automate knowledge safety throughout all of your AWS companies, within the cloud, and on premises. With AWS Backup for Amazon Redshift, you possibly can configure knowledge safety insurance policies and monitor exercise for various Redshift provisioned clusters in a single place. You’ll be able to create and retailer guide snapshots for Redshift provisioned clusters. This allows you to automate and consolidate backup duties that you just needed to do individually earlier than, with none guide processes. To study extra about establishing AWS Backup for Amazon Redshift, discuss with Amazon Redshift backups. As of this writing, AWS Backup doesn’t combine with Redshift Serverless.

Node failure

A Redshift knowledge warehouse is a set of computing sources referred to as nodes.
Amazon Redshift will robotically detect and substitute a failed node in your knowledge warehouse cluster. Amazon Redshift makes your alternative node obtainable instantly and hundreds your most continuously accessed knowledge from Amazon S3 first to assist you to resume querying your knowledge as shortly as doable.

If it is a single-node cluster (which isn’t advisable for buyer manufacturing use), there is just one copy of the info within the cluster. When it’s down, AWS wants to revive the cluster from the newest snapshot on Amazon S3, and that turns into your RPO.

We suggest utilizing not less than two nodes for manufacturing.

Cluster failure

Every cluster has a frontrunner node and a number of compute nodes. Within the occasion of a cluster failure, you will need to restore the cluster from a snapshot. Snapshots are point-in-time backups of a cluster. A snapshot comprises knowledge from all databases which are operating in your cluster. It additionally comprises details about your cluster, together with the variety of nodes, node kind, and admin person title. Should you restore your cluster from a snapshot, Amazon Redshift makes use of the cluster info to create a brand new cluster. Then it restores all of the databases from the snapshot knowledge. Be aware that the brand new cluster is obtainable earlier than the entire knowledge is loaded, so you possibly can start querying the brand new cluster in minutes. The cluster is restored in the identical AWS Area and a random, system-chosen Availability Zone, except you specify one other Availability Zone in your request.

Availability Zone failure

A Area is a bodily location world wide the place knowledge facilities are positioned. An Availability Zone is a number of discrete knowledge facilities with redundant energy, networking, and connectivity in a Area. Availability Zones allow you to function manufacturing functions and databases which are extra extremely obtainable, fault tolerant, and scalable than could be doable from a single knowledge heart. All Availability Zones in a Area are interconnected with high-bandwidth, low-latency networking, over totally redundant, devoted metro fiber offering high-throughput, low-latency networking between Availability Zones.

To get better from Availability Zone failures, you should use one of many following approaches:

  • Relocation capabilities (active-passive) – In case your Redshift knowledge warehouse is a single-AZ deployment and the cluster’s Availability Zone turns into unavailable, then Amazon Redshift will robotically transfer your cluster to a different Availability Zone with none knowledge loss or software modifications. To activate this, you will need to allow cluster relocation to your provisioned cluster by configuration settings, which is robotically enabled for Redshift Serverless. Cluster relocation is freed from price, however it’s a best-effort strategy topic to useful resource availability within the Availability Zone being recovered in, and RTO may be impacted by different points associated to beginning up a brand new cluster. This may end up in restoration occasions between 10–60 minutes. To study extra about configuring Amazon Redshift relocation capabilities, discuss with Construct a resilient Amazon Redshift structure with computerized restoration enabled.
  • Amazon Redshift Multi-AZ (active-active) – A Multi-AZ deployment lets you run your knowledge warehouse in a number of Availability Zones concurrently and proceed working in unexpected failure eventualities. No software modifications are required to keep up enterprise continuity as a result of the Multi-AZ deployment is managed as a single knowledge warehouse with one endpoint. Multi-AZ deployments scale back restoration time by guaranteeing capability to robotically get better and are meant for purchasers with mission-critical analytics functions that require the very best ranges of availability and resiliency to Availability Zone failures. This additionally lets you implement an answer that’s extra compliant with the suggestions of the Reliability Pillar of the AWS Nicely-Architected Framework. Our pre-launch checks discovered that the RTO with Amazon Redshift Multi-AZ deployments is below 60 seconds or much less within the unlikely case of an Availability Zone failure. To study extra about configuring Multi-AZ, discuss with Allow Multi-AZ deployments to your Amazon Redshift knowledge warehouse. As of writing, Redshift Serverless at present doesn’t help Multi-AZ.

Area failure

Amazon Redshift at present helps single-Area deployments for clusters. Nevertheless, you could have a number of choices to assist with catastrophe restoration or accessing knowledge throughout multi-Area eventualities.

Use a cross-Area snapshot

You’ll be able to configure Amazon Redshift to repeat snapshots for a cluster to a different Area. To configure cross-Area snapshot copy, you must allow this copy function for every knowledge warehouse (serverless and provisioned) and configure the place to repeat snapshots and the way lengthy to maintain copied automated or guide snapshots within the vacation spot Area. When cross-Area copy is enabled for a knowledge warehouse, all new guide and automatic snapshots are copied to the desired Area. Within the occasion of a Area failure, you possibly can restore your Redshift knowledge warehouse in a brand new Area utilizing the newest cross-Area snapshot.

The next diagram illustrates this structure.

For extra details about learn how to allow cross-Area snapshots, discuss with the next:

Use a customized area title

A customized area title is simpler to recollect and use than the default endpoint URL offered by Amazon Redshift. With CNAME, you possibly can shortly route visitors to a brand new cluster or workgroup created from snapshot in a failover state of affairs. When a catastrophe occurs, connections may be rerouted centrally with minimal disruption, with out shoppers having to vary their configuration.

For prime availability, you need to have a warm-standby cluster or workgroup obtainable that frequently receives restored knowledge from the first cluster. This backup knowledge warehouse might be in one other Availability Zone or in a separate Area. You’ll be able to redirect shoppers to the secondary Redshift cluster by establishing a customized area title within the unlikely state of affairs of a whole Area failure.

Within the following sections, we talk about learn how to use a customized area title to deal with Area failure in Amazon Redshift. Be sure the next stipulations are met:

  • You want a registered area title. You need to use Amazon Route 53 or a third-party area registrar to register a site.
  • It’s essential to configure cross-Area snapshots to your Redshift cluster or workgroup.
  • Activate cluster relocation to your Redshift cluster. Use the AWS Command Line Interface (AWS CLI) to activate relocation for a Redshift provisioned cluster. For Redshift Serverless, that is robotically enabled. For extra info, see Relocating your cluster.
  • Pay attention to your Redshift endpoint. You’ll be able to find the endpoint by navigating to your Redshift workgroup or provisioned cluster title on the Amazon Redshift console.

Arrange a customized area with Amazon Redshift within the main Area

Within the hosted zone that Route 53 created if you registered the area, create data to inform Route 53 the way you need to route visitors to Redshift endpoint by finishing the next steps:

  1. On the Route 53 console, select Hosted zones within the navigation pane.
  2. Select your hosted zone.
  3. On the Data tab, select Create report.
  4. For Report title, enter your most well-liked subdomain title.
  5. For Report kind, select CNAME.
  6. For Worth, enter the Redshift endpoint title. Be sure to supply the worth by eradicating the colon (:), port, and database. For instance,
  7. Select Create data.

  1. Use the CNAME report title to create a customized area in Amazon Redshift. For directions, see Use customized domains with Amazon Redshift.

Now you can hook up with your cluster utilizing the customized area title. The JDBC URL will probably be just like jdbc:redshift://, the place is your customized area title and dev is the default database. Use your most well-liked editor to hook up with this URL utilizing your person title and password.

Steps to deal with a Regional failure

Within the unlikely state of affairs of a Regional failure, full the next steps:

  1. Use a cross-Area snapshot to restore a Redshift cluster or workgroup in your secondary Area.
  2. Activate cluster relocation to your Redshift cluster within the secondary Area. Use the AWS CLI to activate relocation for a Redshift provisioned cluster.
  3. Use the CNAME report title from the Route 53 hosted zone setup to create a customized area within the newly created Redshift cluster or workgroup.
  4. Pay attention to the Redshift endpoint’s newly created Redshift cluster or workgroup.

Subsequent, you must replace the Redshift endpoint in Route 53 for obtain seamless connectivity.

  1. On the Route 53 console, select Hosted zones within the navigation pane.
  2. Select your hosted zone.
  3. On the Report tab, choose the CNAME report you created.
  4. Beneath Report particulars, select Edit report.
  5. Change the worth to the newly created Redshift endpoint. Be sure to supply the worth by eradicating the colon (:), port, and database. For instance,
  6. Select Save.

Now if you hook up with your customized area title utilizing the identical JDBC URL out of your software, you need to be linked to your new cluster in your secondary Area.

Use active-active configuration

For business-critical functions that require excessive availability, you possibly can arrange an active-active configuration on the Area degree. There are numerous methods to verify all writes happen to all clusters; a technique is to maintain the info in sync between the 2 clusters by ingesting knowledge concurrently into the first and secondary cluster. You may as well use Amazon Kinesis to sync the info between two clusters. For extra particulars, see Constructing Multi-AZ or Multi-Area Amazon Redshift Clusters.

Further issues

On this part, we talk about extra issues to your catastrophe restoration technique.

Amazon Redshift Spectrum

Amazon Redshift Spectrum is a function of Amazon Redshift that lets you run SQL queries towards exabytes of information saved in Amazon S3. With Redshift Spectrum, you don’t must load or extract the info from Amazon S3 into Amazon Redshift earlier than querying.

Should you’re utilizing exterior tables utilizing Redshift Spectrum, you must be certain that it’s configured and accessible in your secondary failover cluster.

You’ll be able to set this up with the next steps:

  1. Replicate current S3 objects between the first and secondary Area.
  2. Replicate knowledge catalog objects between the first and secondary Area.
  3. Arrange AWS Identification and Entry Administration (IAM) insurance policies for accessing the S3 bucket residing within the secondary Area.

Cross-Area knowledge sharing

With Amazon Redshift knowledge sharing, you possibly can securely share learn entry to stay knowledge throughout Redshift clusters, workgroups, AWS accounts, and Areas with out manually shifting or copying the info.

Should you’re utilizing cross-Area knowledge sharing and one of many Areas has an outage, you must have a enterprise continuity plan to fail over your producer and shopper clusters to attenuate the disruption.

Within the occasion of an outage affecting the Area the place the producer cluster is deployed, you possibly can take the next steps to create a brand new producer cluster in one other Area utilizing a cross-Area snapshot and by reconfiguring knowledge sharing, permitting your system to proceed working:

  1. Create a brand new Redshift cluster utilizing the cross-Area snapshot. Be sure to have right node kind, node rely, and safety settings.
  2. Determine the Redshift knowledge shares that had been beforehand configured for the unique producer cluster.
  3. Recreate these knowledge shares on the brand new producer cluster within the goal Area.
  4. Replace the info share configurations within the shopper cluster to level to the newly created producer cluster.
  5. Affirm that the required permissions and entry controls are in place for the info shares within the shopper cluster.
  6. Confirm that the brand new producer cluster is operational and the buyer cluster is ready to entry the shared knowledge.

Within the occasion of an outage within the Area the place the buyer cluster is deployed, you have to to create a brand new shopper cluster in a distinct Area. This makes positive all functions which are connecting to the buyer cluster proceed to operate as anticipated, with correct entry.

The steps to perform this are as follows:

  1. Determine an alternate Area that isn’t affected by the outage.
  2. Provision a brand new shopper cluster within the alternate Area.
  3. Present essential entry to knowledge sharing objects.
  4. Replace the applying configurations to level to the brand new shopper cluster.
  5. Validate that each one the functions are in a position to hook up with the brand new shopper cluster and are functioning as anticipated.

For extra info on learn how to configure knowledge sharing, discuss with Sharing datashares.

Federated queries

With federated queries in Amazon Redshift, you possibly can question and analyze knowledge throughout operational databases, knowledge warehouses, and knowledge lakes. Should you’re utilizing federated queries, you must arrange federated queries from the failover cluster as properly to stop any software failure.


On this put up, we mentioned varied failure eventualities and restoration methods related to Amazon Redshift. Catastrophe restoration options make restoring your knowledge and workloads seamless so you may get enterprise operations again on-line shortly after a catastrophic occasion.

As an administrator, now you can work on defining your Amazon Redshift catastrophe restoration technique and implement it to attenuate enterprise disruptions. You need to develop a complete plan that features:

  • Figuring out crucial Redshift sources and knowledge
  • Establishing backup and restoration procedures
  • Defining failover and failback processes
  • Implementing knowledge integrity and consistency
  • Implementing catastrophe restoration testing and drills

Check out these methods for your self, and depart any questions and suggestions within the feedback part.

In regards to the authors

Nita Shah is a Senior Analytics Specialist Options Architect at AWS based mostly out of New York. She has been constructing knowledge warehouse options for over 20 years and focuses on Amazon Redshift. She is targeted on serving to clients design and construct enterprise-scale well-architected analytics and resolution help platforms.

Poulomi Dasgupta is a Senior Analytics Options Architect with AWS. She is captivated with serving to clients construct cloud-based analytics options to unravel their enterprise issues. Exterior of labor, she likes travelling and spending time together with her household.

Ranjan Burman is an Analytics Specialist Options Architect at AWS. He focuses on Amazon Redshift and helps clients construct scalable analytical options. He has greater than 16 years of expertise in numerous database and knowledge warehousing applied sciences. He’s captivated with automating and fixing buyer issues with cloud options.

Jason Pedreza is a Senior Redshift Specialist Options Architect at AWS with knowledge warehousing expertise dealing with petabytes of information. Previous to AWS, he constructed knowledge warehouse options at and Amazon Units. He focuses on Amazon Redshift and helps clients construct scalable analytic options.

Agasthi Kothurkar is an AWS Options Architect, and relies in Boston. Agasthi works with enterprise clients as they rework their enterprise by adopting the Cloud. Previous to becoming a member of AWS, he labored with main IT consulting organizations on clients engagements spanning Cloud Structure, Enterprise Structure, IT Technique, and Transformation. He’s captivated with making use of Cloud applied sciences to resolve complicated actual world enterprise issues.