By: Greg Pierce
In August, I blogged about my experience with the Delta outage and provided some lessons learned from those calamitous few summer days. With over 650 cancelled flights (and thousands of frustrated customers), I speculated that their disruption in services must have cost them many millions of dollars in damages. And all caused by what was reported as a “small fire” in Delta’s datacenter. Painful, but invaluable, lessons learned, right?
Well, maybe not.
Delta experienced another outage this January. And I, once again, got caught up in it. My flight scheduled for a 6 pm departure from Minneapolis on January 31 had already boarded. I had stowed my stuff, buckled up and was finishing my email. We were just waiting to taxi out, when the pilot comes on the air to announce that the flight had to stop because Delta systems worldwide had gone down. Uh, oh…
15 hours later
My flight finally departed the next morning, and along the way, I witnessed first-hand a textbook example of how not to handle an outage – or your customers. Now, don’t get me wrong. I am and remain a fan of Delta. I recognize that they compete in a ferocious marketplace and that some passengers are never happy. But I do think they deserve some unflinching criticism.
Five lessons, revisited
For your easy reading reference, here’s a quick summary of my five lessons:
- Introduce physical security, safety, and personnel security measures, including having appropriate background checks and security clearance for employees, partners, and vendors.
- Establish rigorously tested, proven failover protocols. If you work with a cloud provider, review their failover offerings. (At Concerto, we include automatic failover.)
- Compare your own organization’s SLA with that of a proven cloud provider. Too many companies who manage their own datacenters do so with an undefined SLA.
- Assess and balance your uptime requirements and risk across a myriad of applications. Conduct a comprehensive audit of your datacenter security and disaster protocols.
- Put a communication plan in place for after something bad happens. If everyone knows what to do, what to say and how to make it up to customers, you can minimize the impact.
At the very least
Reviewing this list today, I’m almost certain that between their first major outage and this most recent one, Delta must have ignored at least three of these lessons. But you could argue that the worst lesson to ignore is the fifth one on my list. At the very least, you need to have a communication plan for an outage.
Epic Delta fail
Delta had no such plan. I watched Delta employees dealing with us with no information whatsoever. All they could do is apologize and ask passengers to wait another half hour. Hours passed. At a certain point, I could see no end in sight and left. Other passengers, including those with children, waited it out and were still there the next morning, red-eyed and exasperated. Imagine that: Waiting for 15 hours in an airport with kids. Yikes!
Naturally, we were all upset, including the poor Delta employees who somehow managed to stay relatively calm. Delta.com provided no information because the website was down. No distribution of load meant no failover and no ability to book other flights.
But what about social media? You might assume Twitter would be the logical place for Delta to communicate what’s going on in real-time. Instead, they continuously cut-and-paste the same statement in response, over-and-over. Here’s a screen grab:
No communication = Bad customer experience
As I watched people at the gates look like they were about to cry, it became obvious that the Delta employees were not empowered to “make it right” with hotel vouchers and the like. They had no way to remedy, or even ameliorate, the frustration. After the first outage, they eventually apologized, gave a full flight refund and a $200 voucher to make it right. This time, no flight refund-just the voucher. Are they normalizing poor customer service?
What Delta needs to do
Work with a top-tier cloud provider. At Concerto, we can ensure a 99.99% uptime for your mission-critical applications with state-of-the-art compliance and security measures, including automatic failover and disaster prevention. You can have total reliability. And, among other things, empower your people on the frontlines to turn an unfortunate situation into a trust-building experience.