The future is bright, and the future is cloudy. Why the Amazon’s AWS outage will accelerate the increase in cloud usage and cloud quality.

Depending on who you ask, the emergence of utility cloud computing is either the best thing since sliced silicon; a revolution in how we do business that will precipitate a new cycle of innovation and opportunity; or a fad built on hot-air, based on little substance and that is marked out for an inevitable collapse under the weight of the collective expectation heaped upon it.

Given the volume of chatter, prediction and speculation on what the cloud might come to mean, you would have to be trying very hard to remain blissfully ignorant about what cloud is about and what utility computing can mean for your business. Most people who work in a management capacity at an organisation that owns or rents servers has an understanding of some of the ways that the realisation of utility computing will benefit them in their role within their organisation, and benefit their organisation as a whole. These centre around:

– The amount of hardware you need to buy or rent to provide enough computing power for your organisation ( both in number-crunching capability, and the ability to store information )

– The number of people needed to ensure an adequate computing service for all of the things you do with computers

– The financial and operational implications of the above

May, 2011 seems a good time to talk about this in the aftermath of the recent, catastrophic failure of many components and services integral to Amazon’s industry-leading public cloud service. Amazon’s cloud has been available for production use since October 2008 and has run largely without incident since then. This recent failure has provided a new, taller soapbox for those who see greater risks than benefits with either the cloud computing paradigm itself, or with how it has been and is being executed.

Amazon is the most advanced public cloud available, offering more features, more capacity, and a greater economy of scale than any other around at this time. In next week’s article I will talk about how this is changing, and some of the alternative clouds available now and that are on the horizon from Rackspace, Tata Communications and others who are aggressively seeking to gain a foothold in an established and increasingly lucrative business.

For me, there are four points to take away and consider as a result of this event. They all point to cloud computing having reached and passed a critical mass from the point of view of how much cloud is being used and how important it is to its users, from which there is no going back.

1. Cloud is everywhere already, it has already happened.

The April 2011 outage affected a huge slice of the internet, taking out web-sites run by high-profile brands for unprecedented periods of time. It received a huge volume of news coverage outside the normal sphere of sites catering to technologists like me. For example, it was reported on the front page of the New York Times and on BBC News at Ten. Members of the public not involved in using, making or selling cloud technology learnt for the first time that the famous bookshop was also an IT company, and one which has a big impact on the services and content they access online.

2. The cloud jungle is massive, and everywhere.

Many of the sites that are running on AWS are businesses with a critical reliance on their web services, some of the are online retailers, others are media sites who push the majority of their content out through the net ranging from games companies to newspapers. We discovered not only that they were using cloud computing, but had been doing so in such a way that they had become totally reliant on it to continue offering a quality service. Such are the economies of scale offered by public clouds that the failure of the cloud meant that there were no viable alternatives. Companies have used more and more of the cloud to offer faster and better services to consumers to the point that they were running thousands of virtual servers in AWS at commodity prices so low that they could never afford to build out an equivalent physical infrastructure that compares in quality to Amazon’s. Even if they could afford it, they could never do so quickly or efficiently enough to replicate or replace their Amazon-resident infrastructure. Companies who used to run their own large scale datacentres have done away with them, and along with them the cost and complexity of running them. They have created in the cloud and seen that it is good.

3. Cloud encourages competition

Rather than the Amazon outage providing proof of the vulnerability of relying on cloud infrastructure, many companies who are using it are seeing it as proof of the vulnerability brought about by relying on a single source. No organisation who has run their own datacentre in the past believes that there is such a thing as a 100% reliable service, while at the same time, everyone who buys Amazon’s cloud service is well aware that the agreed level of availability they agree with Amazon is just over 99.5%, and not 100%. The outage is not what is making them nervous.

Customers who are pining for a credible competitor to Amazon’s solution are doing so because they know that there is always a chance that one provider can be knocked over for a short time; but that the chances of two providers being knocked over at the same time is magnitudes smaller. The more clouds you are able to run your services over, the less likely it is that the failure of any single cloud will affect you. In fact using this law of averages, making sure that your service and business is able to run on as many clouds as is possible gives you a smaller and smaller likelihood of being affected even by a failure of many clouds at once. High redundancy breeds greater availability.

As the world marvels at the scarcity and lack of depth customer support from Amazon during the outage, other competitors will be looking at this and realising that they can win more new customers if they offer more communication to their customers during events like these than Amazon did, while also being an important redundancy for customers already using Amazon. Amazon will most likely be thinking exactly the same thing, and rethinking whatever strategy lead them to decide not to discuss the outage on their blogs or to offer proper advice to customers during and after the incident, so that they can continue to compete with these other providers. The outcome for the user is great, in that all cloud service providers will be placing greater importance on customer service and support.

4. Variation is a redundancy

Finally, the fact is that the AWS outage occurred because of the way that Amazon has chosen to architect their cloud. Amazon builds its cloud using a cathedral methodology where the insides are kept secret from the public and from users. They provide an interface that allows you to do what you want with it, but not to know how it works.

There are other services which offer the same type of rented storage and compute power but are built in an entirely different way to Amazon, using different underlying technology, and different methodology; and so are even less likely to fail in the same way. Having a multi-cloud strategy will give you a way to insure yourself against any one service failing in a particular way, as well as at a particular time. Some are open source and allow you to see much more of how they operate, allowing users to operate them at a more granular level.

The Problem of the future

A multi-cloud platform presents its own challenge, one which is probably part of the reason so many reputable, establish and technology-savvy organisations failed to have one. Imagine you decided five years ago to deploy on AWS, you trained your staff, tried, tested and refined your Amazon deployments; you have probably become quite good at using Amazon. Good enough to be using it, and its unique topology and infrastructure in a specific way that isn’t easily ported to another cloud. Designing and deploying for a cloud that is designed in a different way to Amazon becomes an additional and significant cost.

Luckily there are services and software that you can use that abstract the differences between these clouds, and deploy on multiple clouds in a consistent way. As I work for a company that does just this, I’ll leave that point there for the sake of purity.

I’ll finish by returning to my original point, which is not only that the cloud is here to stay, but that other clouds are coming. The argument may not be which cloud will reign over the others, but which clouds will be seated at the round table.


4 thoughts on “The future is bright, and the future is cloudy. Why the Amazon’s AWS outage will accelerate the increase in cloud usage and cloud quality.

  1. This failure tells me how important it is for companies to build clouds privately first, and then move into public clouds second (or in parallel). This way:

    – you don’t get locked into a single way of running your service in a public cloud
    – you can revert back to private services if something bad goes wrong
    – you can keep private data safe in the knowledge that it is on your turf

    Venture Capital firms telling companies to start their businesses in public clouds from the start are potentially causing problems for that company later down the line.

  2. Thanks Matt,

    I certainly agree that a private cloud is a great failover mechanism, especially where you have corresponding API sets such as the way Amazon’s US East corresponds to Eucalyptus and the Ubuntu UEC stack.

    For many companies, their cloud usage is so high that building a private cloud datacentre that can handle a similar load negates the benefits of going into the cloud in the first place; the available scale and its resultant cost of usage.

    • I was commenting from the aspect of actually building your service (and making sure you do that in private), not running it.

      When running a service, I would say it is worth having just enough scale inside your private cloud to keep your business running adequately, and then use public clouds to deal with the the ‘Christmas rushes’ companies of every type have to deal with every now and again. That way you take advantage of both types of cloud.

  3. Sure, I see now. And then public cloud becomes a spillover platform for your private implementation, rather than the other way round. That sounds particularly useful for e-commerce work loads like you mention, who typically have much more predictable volume spikes.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s