Let’s begin with a story: The first day of the new week started very ordinarily and nothing indicated this was going to be a very long and tiring day for Sarah, a CIO of a large HR agency “Jobs Are Us”. After she finished her breakfast, she headed to the office to attend the CEO staff meeting at 9am. Such meetings have been stifling, almost bordering on boring, but that was not going to repeat itself today.
At 9:13am she received a call from her operations manager about an issue with the job candidates database. Apparently, the system is not responding to users who try to access it using their web browsers. The problem allegedly started before 8am that day but nobody knew when exactly as there had not been any monitoring of its accessibility.
The CEO was strongly vocal about the issue as, in his words, “We are losing hundreds of thousands of pounds per day as our clients cannot post new jobs and review candidates. Fix it!” The head of marketing added fuel to the fire, “I have just launched new campaign to promote our system, and you are now not delivering on the accessibility of the portal as promised!!!” Sarah wanted to say something but let it go this time. She excused herself from the staff meeting and called her own staff to work on the issue at hand.
During the next 2 hours it became clear that the issue is not with the internal company systems but with the cloud service provider who had been hosting their HR servers for the past 2 years. Whilst Sarah’s company is responsible for the development of the application, the cloud provider hosts the servers, network connectivity and databases needed for the application to work properly.
To make matters worse, the cloud provider had gone into administration late last week, with all staff being dismissed by the new company administrators. Naturally, nobody bothered to inform “Jobs Are Us” about it.
A quick brainstorming session with the operational manager, chief technical architect and security manager revealed that:
1. There are no contingency plans that detail what to do if the cloud provider is not available
2. The backup of all systems data is hosted by the very same cloud provider, and the last offsite copy is some 6 months old
3. There is no one answering the phone in the cloud provider’s offices.
Suddenly, Sarah realized that this is probably a good time to freshen up her CV.
In the end, “Jobs Are Us” had to find another hosting company, restore data from 6 months old backup media and spend a considerable sum of money on the data retrieval exercise, laboriously going through individual recruiters’ mailboxes. The company’s reputation was damaged and few big clients walked away. And Sarah? She is now managing IT teams in another company…
While this story sounds like a fantasy, we all know this is happening to someone right now. Companies choose cloud providers to run their critical business systems without proper due diligence and/or plans for exiting the contract. Regardless of whether the exit is as abrupt as in this case or perhaps more subtly planned, it always presents a serious challenge to IT and business teams.
A proper and well-formulated plan detailing an exit strategy during cloud service negotiations is key to keeping one’s job. Let’s have a look at some fundamental principles that should be observed when selecting a cloud provider and negotiating the necessary contract terms.
The Cloud Security Alliance Guide (v3.0) dedicates a whole chapter to the topic of Interoperability and Portability. I would like to highlight the important aspects of the chapter and also add my own perspective, based on personal experience with cloud providers:
Firstly, selecting a cloud provider is no different from choosing any 3rd party vendor for any other service. Yet for some unexplained reasons, company directors have a higher level of trust in a cloud provider than a “standard” 3rd party vendor. This can be a fatal mistake, especially if the organization hands over key business services to the cloud provider.
It is vital to undertake risk assessment for each business process to highlight what the impact would be if the data and systems were compromised, changed or simply unavailable. In my experience, the management is often overly optimistic in these assessments, especially when calculating the likelihood of a disastrous event.
Perhaps this is part of human nature, as we are often very poor in assessing likelihood of threat generally. They blindly see a way of reducing cost by outsourcing these services to a cloud provider, mistakenly believing that they will do a better job than an in-house IT function! Negotiate hard as if the contract is with a “3rd party vendor”.
Secondly, it is key to formulate an exit plan whether it is a planned exit or due to an abrupt (as in the story above) halt to services. Different cloud service models (IaaS, PaaS, SaaS, SecaaS) have distinct characteristics at a technological level, which ultimately affects how a company transfers these services from one cloud provider to another, or to a company’s own datacentre.
Infrastructure as a Service (IaaS) is seen as the easiest to migrate while Software as a Service (SaaS) can prove extremely difficult, costly or even impossible. This aspect should always be added to the base business case before deciding whether to use a cloud service. Sadly, this does not happen often, and in many cases future CIOs pick up the bill for previous incumbent’s lax approach. Plan for migration from the cloud provider and be pessimistic about the difficulty of the migration.
Finally, the creation and testing of incident response plans is critical in mitigating abrupt service interruptions. I strongly suggest testing these plans on annual basis, perhaps more often for very critical operational services. Effort spent in running test scenarios will significantly decrease time to recover during actual incident, as well as stress levels of everyone involved.
Issues will be uncovered and risk will be significantly reduced. One of the key components to be included in the plan is a list of contact numbers for those cloud provider’s key personnel involved in incident resolution. Reliance on email or a web based ticket management system will add to the time for resolving the issue and the former can be useless if the mail service has gone down. I don’t know about you, but I do not like waiting for an email response when in a stressful situation. Test, Retest and test again the incident response.
In the fictitious story I have presented a scenario that can certainly happen to anyone. Disasters happen, often without notice or forewarning. We can, however, be prepared and minimize the impact of the cloud provider’s unavailability, poor service, migration between providers or moving out of the cloud altogether. Review your existing service provider contract now and do take the time to read the Cloud Security Alliance Guide.