We’re F****D, It’s Over: Coming Back from the Brink

In 1997, about a year after launch, Hotmail was growing exponentially, adding thousands of new users every day. We were on fire. And then one night, it all seemed to unravel. We had a program called the “janitor” that ran as an overnight batch process and it erased all of the email that users put in the “trash” folder. Except this night, a bug spawned an army of other janitors that cleaned out everyone’s inboxes, too. That’s right, deep-sixed their email. Here is what went through all our spinning heads: “We’re fucked, it’s over.”

It’s pronounced whiff-eee-o, that horrible, terrifying moment that nearly every entrepreneur goes through when they are certain that their company is dead. I’ve seen it happen in so many different ways: A legal ruling goes against you; Apple refuses to approve your app unless you change the feature that makes it special;  Google launches a competitive product, and aims right at you; A critical technology partner decides not to renew their contract. It’s that moment before you gather yourself to do battle, when all seems lost. I have struggled through that moment, first at Hotmail and again at IronPort. Coming out the other side, scars and all, there are a few critical things you take with you:

The Hotmail WFIO

Once we had pulled the plug on the extra “janitors”, it turned out that about 25% of our total users were affected. Not everyone, but holy shit, a quarter of our customers had lost everything. Understandably, they were pissed. CNET and ZDnet were both on the horn wanting to know what happened. Customer care was inundated with angry calls and (ironically) emails. We figured out how to restore a few thousand customers, but millions were completely unrecoverable. While the calls rolled in, we were trying to figure out how to fix things.

I remember Hotmail’s CEO Sabeer calling us all in a room. “Hey, Rex (the COO), how long will it take to restore the email from the tape backups?” And I’ll never forget his answer: “Um, those got really expensive, so we stopped doing them about a month ago.” Gulp. Long … painful … silence.

The higher-level problem was that people were just getting comfortable trusting us with hosting their email and now we had completely let them down. Their email was just gone. We did a lot of communicating to the users and promised them we would “grandfather” them in for some planned paid services for free. We apologized profusely and explained how the sun, the moon and the stars lined up against us for it to happen. We also clearly explained what steps we were going to take so that it would never happen again. Over time, they started receiving more email, their inboxes filled up, and we just rode it out.

The IronPort WFIO

At IronPort, we developed a super fast and scalable email gateway that ran roughly 10 times faster than any other alternatives. We were definitely in the right place and time when the spam flood came. Our gateway was the only one that could handle the load. Just as at Hotmail, we were adding customers as fast as we could get the company names jotted down, but we needed to add an anti-spam component to our offering.

We struck up a partnership with Brightmail, the leading anti-spam software company, and the joint product – an IronPort gateway with Brightmail – was unbeatable in the marketplace. The only problem was we were heavily dependent on each other with both companies scrambling to build what each other had. Over time we both knew there was going to be a day of reckoning.  We attempted to merge the two companies to solve the problem, but the VCs couldn’t agree on terms.  And then Symantec bought Brightmail.

We knew the clock was ticking and had most of the engineering team working on IronPort anti-spam. But it was way late and wasn’t working. Shortly after the Symantec acquisition, we started hearing reports from channel partners that they were planning to cancel our contract. Although we knew that day would eventually come, we were totally unprepared! WFIO!

As detailed in a prior post, my VP of engineering, Nawaf, “cracked the egg with a sledgehammer” and got our anti-spam product working just in time.  Instead of Symantec canceling the contract, we went on the offensive and faxed a letter to all of our customers cancelling the contract with THEM– a position of strength. We managed through the madness and got to the other side.

After the IronPort WFIO had receded in my rear-view, I realized that you can almost always get to the other side. You just need to keep in mind a few things, and have some emergency tools ready to pull out.

It’s never really as bad as it seems.

Companies are damn resilient. Although it certainly feels like death at the time, it rarely is and companies just keep on moving forward. Ingenuity and guts usually help you find your way out of the jam. In fact, much of a company’s value is usually created by figuring out a solution to the big obstacle. Certainly, that was the case at IronPort where our entire business was built on the back of our anti-spam product.

Get all the brains around the table.

Whenever we went through a WFIO, we’d get all of the smartest people in the room and work through every angle. This is a little counterintuitive because most leaders have the tendency to share very little with the extended team as they are worried about freaking them out. This is a mistake on a number of fronts – trusting your team in crisis brings the company together in amazing ways and their contributions may very well save your company. All of our serious issues resulted in an “Apollo 13” atmosphere where we’d bring together top engineers, architects, VPs – anybody that could materially contribute – and hash it out.  Fighter pilots, who are constantly in pursuit of perfection, have what they call a “rank-less” debrief after every mission where everyone involved, regardless of rank, speaks up to criticize what went wrong.

Lead from the front.

This is the time for leadership – you cannot punk out. I think about a ship captain sailing the Atlantic in the 1700’s and rolling into a huge storm: regardless of how fearful, doubtful, or just scared shitless you may be, there’s only one way to play it with your team – you are in total control. Think of how much it could affect the outcome?! The team needs you to lead them through the problem. Back to the ship captain, what if he grabbed a bottle of whiskey and holed-up under his bed? The ship would certainly be lost. However, if he calmly makes a pot of coffee, ties himself to the bridge, and starts shouting orders, then I believe the chances of the ship making it through go up dramatically. The leader needs to be the first one there, the last one to leave, and be willing to do anything it takes – like answer customer care calls or personally drive a replacement part to an irate customer. Nothing is beneath a leader in times of crisis.

In between Hotmail and Ironport, I started an ecommerce company that got up to about six people. I funded it myself. When I couldn’t get funding for it and I had burned through all of my Hotmail money, I shut it down.  Sometimes it really is over, and I believe that one of the clearest signs is when you are completely out of cash.  But right up until that point, when the music is still playing, when your team is still driving hard, you  have a shot – and usually a decent one.

  1. Thanks for the honest sharing especially the tiny bit about your eCommerce company. I meet a lot of fellow entrepreneurs and founders and each of these little anecdotes is deeply meaningful and helpful.

  2. This article describes pretty much my current situation. I tied myself to the bridge and fight the waves. I stand at the same time in front of the abyss and take off just before.

    • Ari Thompson said:

      Give a man a fish and he will eat for a day; teach a man to fish and he will eat for a lifetime; give a man religion and he will die praying for a fish.

      • piersrr said:

        But he will still vote for the man that gave him the fish

  3. james3967 said:

    I remember helping Hotmail through several of those WFIO moments over the years at Exodus. One of Exodus’s own WFIO moments came when a city water main burst beneath the datacenter at 1605 Wyatt Dr, and began flooding the floor beneath Hotmail’s (and everyone else’s) equipment. As you may recall, the power tails were just tossed under the floor, so we all knew it was only a matter of time before a massive short fried every circuit in the building. Except it didn’t. We formed a bucket brigade, dipping a bucket into electrified water, and bailing water out of the floor until the city could shut off the main. Somehow, not a single circuit went down, despite being immersed in over a foot of water for hours. Hotmail survived. We survived. Miracles do happen.

    • I remember many crazy days at Exodus. The Cisco engineers were literally sleeping next to these new fangled “load balancers” at Hotmail, hoping they would scale…

  4. Syed said:

    This is very true, “Companies are damn resilient. Although it certainly feels like death at the time, it rarely is and companies just keep on moving forward. Ingenuity and guts usually help you find your way out of the jam. In fact, much of a company’s value is usually created by figuring out a solution to the big obstacle…”

  5. Great read, Hotmail story was very touching, I can only imagine the atmosphere – which lets be honest, is probably one of the most profound experiences to remember.

    However, I disagree with the fact that the ships captain is going to just chill out and make a cup of coffee, when a big storm is approaching. Its a ship, in sea, and storm is approaching, where did coffee come from?

    • Nick said:


    • Cabral DiMarzio said:

      As a ship’s captain, at sea, in a storm, right the eff now, I can guarantee that coffee is essential. Also, once the situation is stable, you can go to your bunk with your coffee and surf the net… at least until something changes.

  6. OK – I clearly have never been a ship captain but you get the idea…

  7. Trust me, the ship has coffee and wide based coffee cups to limit spills in high seas

  8. One of my “we’re f****d” moment was in early 2001 when we switched platforms and software at HomeGain when we were signing up 5-10k regs a day for a free home valuation. Perfect storm: we were one of LoudCloud’s first customers and we were experimenting with new open source software. Nothing seemed to work. My most vivid memory was Ben H’s tireless efforts with my COO John Baker to fix the problems. We survived and thrived. At the time, I thought it was the end of world.

  9. Steve Kohr said:

    Thanks for sharing. Roll up your sleeves and figure it out. Involve everyone you can, sometimes the greatest ideas come from the most unlikely sources. The one thing that helped me is…..I’m not alone, I have a team, lets do this!

  10. Punter said:

    Great anecdotes on a couple of major technical hurdles, but you don’t mention finances – isn’t that the thing that sinks most companies? When the money runs out, there’s nowhere to hide.

    • Of course. Running out of money before hitting a milestone is what does it for sure but there are many scary moments in between.

  11. Scott, you have a far different memory of that event than I do. Especially since I was in the midst of another 80 hour ‘day’ trying to keep things together.

    It wasn’t 25% of all the users. It was 25% of the users on a particular server and it only affected those who had deleted mail in their accounts – our most active users. In fact, my own account was affected and I lost all my historical Hotmail email.

    We did have backups, but they were unreliable. Backups often took >24 hours which didn’t make for a reasonable backup window. We could have probably restored some of them, but I think given how slow the tape drives we used were, the restore would still be going. (BTW, I still have a few tapes with backups of our old code for the site and probably some ad logs.)

    Syed: The Cisco Load Balancer team was in Georgia. They never saw the Wyatt cage.

    It’s amazing we made it through those days.

    (Former Director of Site Operations for Hotmail)

  12. Thanks for the clarification, Josh – I’m sure your memory is better than mine… We all learned a ton going through that experience together for sure!

  13. Ankur said:

    Hi Scott,

    Thanks for an illuminating post.

    Just curious though as to how a decision to eliminate tapes at Hotmail could even have been approved without taking into account the potential consequences of exactly the kind of scenario you mentioned in your post? It seems that after elimination of tapes, there was either no backup or really slow backup recovery system, neither suitable or useful in a real crisis scenario. Is this hindsight 20/20 or could it have been predicted?

    It seems that there might have been some organizational/decision making issues also at work here since the CEO was unaware of such a drastic decision? It also seems that financial reasons also dominated prudent decision making as well, since tapes were eliminated due to them being expensive leaving Hotmail open to a scenario like this.


    • I think it was just an honest fuck up we made at an early stage company. The service was scaling faster than anything we had ever seen and I’m sure we were just making tradeoffs… I’d love to tell you that it was a calculated risk but we were just trying to keep the thing up with chewing gum and dental floss!

      • Ankur said:

        Thank you for the answer!

  14. Scott, that’s a heck of a helpful post.Your insights about there always being a way to get to the other side, and the resilience of good teams and companies are well put, and strike to my core. They’ll stay with me.

Leave a Reply

Your email address will not be published. Required fields are marked *