DevOps in a Bi-Modal World (Part 2 of 4)

In Part 1 of this series we explored how IT is now faced with the challenges of what Gartner calls a Bi-Modal world where the business must continue working with their existing infrastructure and processes (Mode 1), while at the same time developing new processes and building new infrastructure to become more agile (Mode 2). The challenges are complex and we concluded that most organizations are trying to address four key problems across their emerging bi-modal world.

In mode-1 they are looking to increase relevance and reduce complexity.

In mode-2 they are looking to improve agility and increase scalability.

Here, we will discuss in more detail how organizations can address the challenges of Mode 1.

Mode 1: Increasing Relevance by Accelerating Service Delivery

Delivering development and test environments to developers in many enterprises generally starts with either a request to a service management system or a tap on the shoulder of a system administrator. This usually depends on the size of the organization and maturity of the IT department. Either way, once requests fall into a service management system there are often many teams that need to perform tasks to deliver the environment to the developer. These might include virtual infrastructure administrators, systems administrators, and security operations. In larger organizations you could expect to see disaster recovery teams, networking teams, and many others involved in this process too. Again, depending on the maturity of the organization how all of this is coordinated could range from taps on shoulders to passing tickets around in a service management system.

At best each team takes minutes or hours to respond and perform some manual tasks and often the person who requests the service must be asked follow up questions (“Are you sure you need 16GB of RAM?”, “What version of Java do you need for this?”). The result is lots of highly skilled people spending lots of time and very slow delivery of this environment to the developer. Multiply this by the number of developers in an organization and the number of requests for environments and you can understand why traditional IT processes and systems are struggling to maintain relevance.

A solution for this problem is to introduce a service designer into the process (you may be familiar with this from ITIL) that can enable self-service consumption of everything developers need. The designer works with all stakeholders including virtual infrastructure administrators, system administrators, and security operations to obtain requirements. Then, the designer builds the necessary configuration management content and couples it with a service catalog item. By invoking this catalog item the environment can be deployed automatically across any number of providers including virtualization providers, private, or public cloud.

The result of this solution is that all the teams responsible for delivering an environment are now free to do more valuable work (like working with development teams to design operations processes that work as part of development instead of being bolted on after). It also removes human error from the equation, and most importantly, it delivers the environment in significantly less time. We have seen upwards of a 95 percent improvement in delivery times in many of our customers [1].

Mode 1: Reducing Complexity by Optimizing IT

Speeding up delivery of environments to developers or end users is a great way to make IT more relevant, but a lot of what IT is spending their time on is the day-to-day management of those environments. If IT is spending so much time on day-to-day tasks how can we expect them to deploy the next generation of scalable and programmable infrastructure or have time to work with development teams during early stages of development to increase agility?

I have found that many virtual infrastructure administrators spend time on several common tasks that should be largely automated through policy.

First, are policies around workload placement. Often one virtual infrastructure cluster will be running hot while another one is completely cold. This leads to operations teams being inundated by calls from the owners of applications running on the hot cluster asking why response times are poor. Automating this balancing through control policies can alleviate this problem and keep virtual infrastructure administrators free to other things.

Next is the ability to quickly move workloads between different infrastructures. This has become increasingly important as organizations looks to adopt scale-out IaaS clouds. Operations leadership realizes if they can identify workloads that do not need to run on (typically) more expensive virtual infrastructure they could save money by moving those workloads to their IaaS private cloud. This migration is typically a manual process and it’s also difficult to even understand what workloads can be moved. By having a systematic and automated way of identifying and migrating workloads enterprises can save time and move workloads quickly to reduce costs.

Yet another issue is ensuring compliance and governance requirements are met, particularly with workloads running on new infrastructures, like an OpenStack based private cloud. Not knowing what users, groups, data, applications, and packages reside on systems running across a heterogeneous mix of infrastructure presents a large risk and operations teams often have the responsibility and obligation of ensuring this risk is minimized. By being able to introspect workloads across platforms operations teams can gain insight into exactly what users, data, and packages are running on systems and leverage the migration capabilities I mentioned previously to make sure systems are running on appropriate providers.

Finally, since IT has often become a broker of public cloud services it’s important that they can account for costs and place workloads on appropriate regions in the public cloud to control costs while also ensuring service levels for end users are maintained. If developers are based in Singapore then we should leverage public cloud infrastructure in that location instead of deploying to a more expensive and more latent public cloud infrastructure in Tokyo.

By implementing policy based automations our customers have seen large improvements in their resource utilization and a reduction in CapEx and OpEx per workload managed [2].

 

Coming next in Part 3: We will discuss Mode 2 in more detail with concrete examples and solutions.

 

Interested in discussing more with us? Be sure to visit us at the OpenStack Summit in Tokyo, October 27-30, booth P7 to meet James and the OpenStack team from Red Hat.

 

References
[1] http://www.redhat.com/en/resources/union-bank-migrates-unix-and-websphere-red-hat-and-jboss-solutions
[1] http://www.redhat.com/en/resources/g-able-improves-resource-allocation-red-hat-solutions
[2] http://www.redhat.com/en/resources/cbts-enhances-customer-service-red-hat-cloudforms