Highly available virtual machines in RHEL OpenStack Platform 7

OpenStack provides scale and redundancy at the infrastructure layer to provide high availability for applications built for operation in a horizontally scaling cloud computing environment. It has been designed for applications that are “designed for failure” and voluntarily excluded features that would enable traditional enterprise applications, in fear of limiting its’ scalability and corrupting its initial goals. These traditional enterprise applications demand continuous operation, and fast, automatic recovery in the event of an infrastructure level failure. While an increasing number of enterprises look to OpenStack as providing the infrastructure platform for their forward-looking applications they are also looking  to simplify operations by consolidating their legacy application workloads on it as well.

As part of the On-Ramp to Enterprise OpenStack program, Red Hat, in collaboration with Intel, Cisco and Dell, have been working on delivering a high availability solution for such enterprise workloads running on top of OpenStack. This work provides an initial implementation of the instance high availability proposal that we put forward in the past and is included in the recently released Red Hat Enterprise Linux OpenStack Platform 7.

In putting forward this original proposal it was posited that there are three key capabilities  to any solution endeavoring to provide workload high availability in a cloud or virtualization environment:

  • A monitoring capability to detect when a given compute node has failed and trigger handling of the failure.
  • A fencing capability to remove the relevant compute node from the environment.
  • A recovery capability to orchestrate the rescuing of instances from the failed compute node.

Rather than re-inventing the wheel inside the OpenStack projects themselves it is possible to deploy and manage an OpenStack environment with these capabilities using traditional high availability tools such as Pacemaker, without compromising the scalability aspect of the overall platform. This is the approach used to deliver instance-level high availability in RHEL OpenStack Platform 7. You can view a demonstration of the solution in action, as previously shown at Red Hat Summit in partnership with Dell and Intel, here:

In this implementation monitoring is performed using the NovaCompute pacemaker resource agent while fencing and recovery are handled by the fence_compute pacemaker fence agent and the NovaEvacuate resource agent. These three new components were all  co-engineered by the High Availability and OpenStack Compute teams at Red Hat and are provided in updated resource-agents and fence-agents packages for Red Hat Enterprise Linux 7.1.

Monitoring

In a traditional pacemaker deployment each node in a cluster  runs the full stack of services for ensuring high availability, including pacemaker and corosync. The traditional HA setup, as delivered via RHEL High Availability add-on, supports up to 16 nodes. In contrast a typical OpenStack deployment has many hundreds or even thousands, of compute nodes that need to be monitored. To close the scalability gap, the Red Hat HA team designed and developed, from the ground up, pacemaker_remote.

By using pacemaker_remote it is possible to continue adding compute nodes and connecting them to the Pacemaker cluster running on the OpenStack controller nodes without running into the 16 node limit, thus keeping all of the nodes in a single administrative domain. As a result the compute nodes do not become full members of the cluster and do not need to run full pacemaker, or corosync stacks, instead just running pacemaker_remote and integrating with the cluster as remote nodes.

This eases the process of scaling out the compute cluster while still allowing us to provide some neat functions in relation to providing high availability, including monitoring compute nodes for failures and automating recovery of the virtual machines running on them when failures occur. To do this the Pacemaker cluster running on the controller nodes monitors pacemaker_remoted on each compute node to confirm it is “alive”. In turn, on the compute node itself pacemaker_remoted monitors the state of a number of services including the Neutron and Ceilometer agents, Libvirt, and of course the nova-compute service itself. In the event of an issue being detected in one of these services pacemaker_remote will endeavour to recover it independently. In the event this fails however, or if pacemaker_remote stops responding entirely, fencing and recovery operations are triggered.

Fencing

In the event that a compute node fails Pacemaker powers it off using fence_ipmilan (other fencing mechanisms will be supported in the future), while it is powering down the fence_compute fence agent loops waiting for Nova to also recognize that the failed host is down. This is necessary because OpenStack Compute (Nova) will not let an evacuation be initiated until it recognizes the node being evacuated is down. In the near future, it will be possible for the fence agent to use the force-down API call (formerly referred to as “mark host down”), introduced in OpenStack “Liberty”, to proactively tell Nova that the node is down and speed up this part of the process.

Recovery

Once Nova has recognized that the node is down in response to either the original failure or Pacemaker explicitly powering the node off the fence agent initiates a call to Nova host-evacuate which triggers Nova to restart all of the virtual machines that were running on the failed compute node on a new one. In the future it may be desirable to have an image property or flavor extra specification that can be used to explicitly “opt in” to this functionality only for traditional application workloads that need it.

In this implementation we assume that impacted virtual machines are either using shared ephemeral storage, for example Ceph, or were booted from volumes. These characteristics make it possible to recover the instances, including their on-disk state, even when the host on which they were originally running has gone down permanently. An out of the box RHEL OpenStack Platform 7 deployment uses Ceph for this purpose.

If pacemaker_remote is also successful in powering the node back on then it will be returned to the pool of available compute resources when the Nova heartbeat process discovers its return to operation.

The combination of these monitoring, fencing, and recovery capabilities provide a solution that makes it easier than ever to migrate traditional, business-critical applications that require high availability to OpenStack.

Want to try it out for yourself? Sign-up for an evaluation of Red Hat Enterprise Linux OpenStack Platform today! Existing users can find instructions on manually enabling high availability for their compute nodes in the Red Hat Knowledgebase. We would love to get more feedback on this feature as we work on integrating these capabilities and more into the RHEL OpenStack Platform director (based on the “TripleO” project) to provide full automation.

Want to learn more about moving instances around an OpenStack environment? Don’t know the difference between cold migration, live migration, and evacuation? Catch my presentation – “Dude, this isn’t where I parked my instance!?” –  at OpenStack Summit Tokyo!