‘Success disasters’ have become a big problem in tech for companies and consumers

When tickets for Taylor Swift's Eras Tour first went on sale, the red-hot demand was enough to make Ticketmaster crash.
This technology failure during a major scaling event — dubbed a success disaster — can be avoided, experts say.
When adding new technologies as companies scale, experts say focus on three sequential steps: Buy in, onboarding, and usage.

When tickets for Taylor Swift's Eras Tour first went on sale, the red-hot demand was enough to make Ticketmaster crash. Even competitor SeatGeek, which handled the sale of tickets for five of the tour's 52 concerts, experienced technical difficulties due to fans clamoring for early access.

These technology failures during a major scaling event — dubbed success disasters — weren't alone.

Sony's release of the PS5, as well as widespread AWS and Xfinity outages, show the impact of technology failures for product and service scaling events — whether it's a slow burn like a materials delay or hard-and-fast crash like a cloud outage that breaks your customer's trust.

Get New England news, weather forecasts and entertainment stories to your inbox. Sign up for NECN newsletters.

More than 60% of companies using the public cloud reported losses from outages in 2022. "It's really important that you can remain resilient for those kinds of failures," said Spencer Kimball, CEO and co-founder of Cockroach Labs (a cloud-native database platform provider) and a former engineer at Square and Google.

Ticketmaster's failure during their scaling event was just one kind of success disaster — the overburdening of the system's capacity. There are also cases when customers simply lose access to their data center via power or networking failure, for example. Alternatively, the technology itself cannot handle its intended use, and engineers must swiftly prepare a subsequent version for launch only after disappointing customers.

Not only are companies dealing with losses in the moment, but they must also figure out what went wrong after the fact, leading to even more lost productivity. "People call them post-mortems," said Kimball. "Their teams are madly fielding a recovery effort."

Money Report

41 mins ago

Treasury yields ease slightly ahead of key inflation data

1 hour ago

Anglo American rejects BHP's $39 billion takeover bid to form mining juggernaut

Success disasters may be common, but that doesn't make them acceptable, especially if a company is offering products or services considered essential.

"There are others, big banks, financial services, where the regulations are starting to stipulate that you have to survive even when a cloud fails, and at the very least you need to have portability where you can resuscitate your service in some small amount of time, which is not really well defined right now," Kimball said.

If survivability is nonnegotiable, what can organizations do to avoid success disasters?

Masa Kabayama, CEO and co-founder of Uplift Labs, an AI-powered 3D movement analysis firm serving MLB and the NBA, has made a career out of scaling technology. He formerly helped Lego introduce its robotic platform Mindstorm, Apple implement the iPad for education and Tesla scale the Model S.

When adding new technologies as companies scale, Kabayama suggests focusing on three sequential steps: Buy in, onboarding, and usage.

He says buy in requires the trifecta of executive endorsement, managerial advocates and ground-level evangelists. Meanwhile, he explains that onboarding requires an over-indexing of education, "ensuring that no stone is left unturned in terms of having the customer literacy, understanding your tech and how it's applicable to the problems they want to solve."

Tracking usage means looking at what features are being used and how, and implementing iteratively as a company continues a dialogue with early adopters.

This process is not as formulaic as it seems, Kabayama said. There's a balance between triaging existing features (through planned maintenance, for example) versus implementing new features.

Kimball emphasizes over-indexing as a key preventative step. "Model this adversarial behavior and plan for it by testing in a capacity that's way beyond normal so you get these outlier events as things you've tested for and feel you can survive," he said.

Even if legacy technology is failing, rearchitecting the entire system may not be in the budget. However, every organization reaches a point where they must choose how to make change. Kimball said that upgrading via a total overhaul can be worthwhile, but organizations should consider whether they'd rather change features incrementally or do so in one fell swoop. The right answer really depends on the starting point and what scaling events are anticipated.

For her part, Houston Livestock Show and Rodeo's director of ticketing Paula Urban chose to take the incremental approach. With 2.4 million guests coming to her events, she decided to split ticket sale launch times to quell traffic.

"We implemented a virtual waiting room, which we believe was a smart move to prevent scalpers and resellers from exploiting the ticketing system," said Urban. "By controlling the number of fans entering the store and strategically managing multiple waiting rooms for different events, we could ensure that real fans had a fair chance at securing the best seats."

Scalper bots were a huge cause of stress on Ticketmaster as they launched Taylor Swift ticket sales, with some of them even coming after Verified Fan access code servers. This stress led to a pause in ticket sales. Even on a smaller (but still substantial) scale, Urban keeps this in mind to avoid her own success disaster.

For companies planning to scale — which encompasses most organizations — there is no end game. "Nothing's ever static, and companies that have that iterative, innovative culture are ultimately able to produce a way better user experience," Kabayama said.