Klarna's AI Experiment Is Going Fine, Actually
or, how to be a visionary without breaking the bank

Whoops!
Turns out Klarna was wrong about their ability to replace hundreds (thousands?) of human customer service reps with AI.
Eighteen months ago, Klarna and CEO Sebastian Siemiatkowski were some of the most-cited AI bulls in the world. They had stopped hiring and decreased headcount by a thousand employees because AI could effectively replace workers.1 ChatGPT was doing the work of 700 customer service agents, they said. They had replaced their Salesforce instance with AI, they said, whatever that meant.2
Seb wanted to be Sam Altman’s “favorite guinea pig”! And the financiers sang hallelujah - Klarna’s valuation rebounded over 2x, from $6.7B to $14.6B.3
Now, Siemiatkowski admits that the sprint towards AI-powered efficiency “has gone too far”. Cost had been “a too predominant evaluation factor”, and the quality of their customer service and customer experience has suffered.
Time to hire humans again. “Really investing in the quality of the human support is the way of the future for us," Siemiatkowski told Bloomberg last week. Shares of call center provider Teleperformance SE soared +4.4% on the news.4
Oh also, the IPO is delayed. (Probably unrelated, but still funny.)
The golden AI bull has fallen and the luddites are thrilled. Clearly this case study is evidence that we should pump the brakes on all this AI stuff, maybe even come to a screeching halt. Blogs with names like “customerexperiencedive.com” are reminding us that humans are here to stay, on account of survey data that shows most consumers have had negative experiences with chatbots, or on account of other survey data that says respondents think human empathy and connection are more important than quick responses.5
Not to be outdone, the LinkedIn bots and thought-free thought leaders pushing regurgitated AI headlines into the ether don’t seem to care about the new news. A quick search of the latest LinkedIn posts mentioning Klarna surfaces that “many CEOs could learn from the all-in on AI approach of Klarna”, that “the time for AI pilots and experimentation is past - we must commit”. The future is now, today! etc.
Of course, the interesting takes always lie in the messy middle, and there are some actual findings of note being published when it comes to the overall success of AI initiatives so far.
Informatica’s CDO Insights 2025 report says that most companies see fewer than 50% of their GenAI pilots ever make it to production - either from lack of data literacy in their organization, difficulty demonstrating value, or immaturity in the underlying AI tech itself.
The 2025 edition of the IBM CEO Study surveyed 2,000 CEOs who reported that only 25% of AI initiatives have delivered expected ROI, and an even smaller 16% have ended up deployed at scale across their enterprise.
To the luddites, this does sound like quotable evidence to suggest that Klarna and Siemiatkowski were fools caught up in a hype cycle.
But to those of us who have put our ideas to the test in any area of business - marketing, product development, design, infrastructure and logistics - these low success rates don’t sound particularly surprising at all.
Clayton Christensen says that 95% of new products fail after their introduction to the market. (Ed. note: this is a popular misattributed quote, but blatantly incorrect. See my correction here.) Al Greco, a marketing professor who studies the publishing industry, says that 7 out of 10 new books lose money.6 Even for drugs that are approved for human trials, only ~15% of those make it to market.7 The expected value and impact of any given social program, evaluated via an experiment, is zero.8
These low success rates sound like exactly what the experimentation leaders and A/B testing users of the world have been reporting for decades. Most of our ideas drive no incremental impact to the measures of success that we care about (e.g. revenue or ROI), or are even actively harmful. Only a minority have actual positive impact.
(For readers joining us outside of the A/B testing domain, it’s worth highlighting that the median success rate of online A/B testing programs at major tech companies is reported as 10%. See this comment from Ronny Kohavi below.)
When you give the Klarna story a more generous reading - remove the personal brand play that postures Siemiatkowski as some kind of visionary, strip away the overly-confident veneer of the headline quotes - this is all that they did: they ran an experiment replacing customer service reps with AI.
Admirably, they got off their butts and actually learned what works in the real world while most companies are still sitting on the sidelines, comparatively. Learning velocity is one of the most powerful competitive advantages in business. It’s like Mark Zuckerberg is fond of saying about Meta: “If we can learn faster than everyone else, we’ll win.”9
That doesn’t mean that Klarna ran a good experiment here. Or an efficient one. But the outcome itself was fine, typical even.
What is the difference between a costly mistake and a good experiment?
I often think about this tweet from Sean Taylor (a maxim that has been passed around Eppo ever since):
“Everyone's running experiments, but only some of them have control groups and randomization.”
Obviously we can’t always use randomized controlled trials to answer our questions, but Klarna could have stuck the landing better with some experimentation tools and mindset. Here are three broad ideas I’d suggest they think about more intentionally next time (should Mr. Siemiatkowski or one of his deputies be interested in paying my exorbitantly-high consulting rate):
1. Dramatically speed up the feedback loop by setting clear success criteria in advance
While Klarna is likely learning about what works and what doesn’t with today’s AI technology faster than their competitors, eighteen months is still a very long time to evaluate a concrete idea like “replace customer service reps with an LLM”. If they had pre-defined the metrics they care about, set success criteria to look for and guardrails to trigger a re-think, they could have moved in a fraction of the time.
Siemiatkowski acknowledges that this was one of the main points of failure in his interview with Bloomberg: they made cost their primary metric and failed to measure, monitor, or take action on important customer satisfaction metrics. Ronny Kohavi (former experimentation leader at Microsoft) hammers on this when he tells teams to define an Overall Evaluation Criterion - in short, make sure you have a quantitative measure to define an experiment’s success that aligns with long-term organizational goals.
2. Limit risk and maximize optionality by optimizing for cost-per-experiment
If it weren’t for Klarna’s enormous appetite for PR, they could have easily run a much smaller-scale pilot and gotten the same learnings as their slow, expensive “all-in” approach. When we’re armed with the reality of low success rates across our ideas, we look to get leverage on a factor much more in our control: cost-per-experiment.
(Jeff Bezos has talked about this for a long time - including in a 2007 interview with HBR: “So the key, really, is reducing the cost of the experiments.”)
Klarna could have come at this any number of ways: pilot the approach in one country, or with a few teams, or run a true A/B test randomizing users between a human rep and the LLM chatbot to measure user metrics.
All of these would have been significantly cheaper approaches than what they did. Ask any HR team - firing and hiring hundreds of people carries enormous costs.
The tired refrain from the naysayers on this point is that running experiments vs. leaping to full-scale action “slows us down”. Surely we could go faster by jumping in feet first to the “all AI” thing instead of waiting for experiment results?
This is nonsense. First off, Klarna took eighteen months to reverse course the way they did it. That’s slow. But even putting aside this particular case, moving quickly in the wrong direction is not a win. Speed alone does not excuse mistakes if they could’ve been made at one-hundredth of the cost.
And as Jeff Bezos says, when experiments are expensive, only a few people will get to run only a few experiments. If a firm’s capacity to test potential innovations is too low, they’ll fail to find the few ideas that work - and will quickly be outpaced and outplaced by their competition.
3. Publicly position your experiments as experiments (maybe)
Here is where I veer outside of the objectively more-optimal and into some subjective marketing advice. (I am not a public relations professional, but I am at least a marketer by trade - so I’m allowed to have half an opinion here)
I think this whole saga would have landed with a far more favorable swing in opinion of Klarna and Siemiatkowski if they had just said “we’re running an experiment around replacing customer service reps with AI”. Not “we’re doing this, it’s already done, actually, and we are such geniuses for seeing it.”
The former is obviously far less sexy. There is a long lineage of “founder mode” CEO types who build notoriety by making these sorts of bold proclamations and blindly leading their companies forward towards them. But while “move fast and break things” was enamoring for Mark Zuckerberg, I think it’s a far less palatable mantra in a post-ZIRP era for a banking company on the verge of IPO.
I don’t know. Maybe the choir would be notably less jubilant if Klarna had preached about their AI efficiency “experiment” as opposed to their “big bold AI vision™”. But saying “we’re trying this” instead of “we’re doing this” is not a signal of lacking conviction. It’s an honest demonstration of a commitment to innovation and pushing forward, to not getting left behind. Maybe more CEOs could take cues from Bezos and his many Amazon shareholder letters extolling the virtues of experimentation.
At any rate, the whole point here is that Klarna’s misadventure is far less interesting of a news item than most observers apparently think. They ran an experiment and it didn’t work out. Congrats? I’ve done it ten thousand times.
This pull-quote was already an exaggeration in December 2023. “Klarna’s global press lead, John Craske, tells TechCrunch that Siemiatkowski’s comments about hiring are directionally true but says the CEO was ‘simplifying for brevity in a broadcast interview.’”
The outside analysis said “not really”. Siemiatkowski kinda sorta clarified further a year later on X.
https://sherwood.news/business/buy-now-pay-later-giant-klarna-is-finally-ready-to-file-for-ipo/
https://www.bloomberg.com/news/articles/2025-05-08/klarna-turns-from-ai-to-real-person-customer-service
https://www.customerexperiencedive.com/news/klarna-reinvests-human-talent-customer-service-AI-chatbot/747586/
https://www.shelf-awareness.com/issue.html?issue=332#m2122
https://pubmed.ncbi.nlm.nih.gov/39805539
https://gwern.net/doc/sociology/1987-rossi.pdf
Quoting Eric Ries, who writes in The Lean Startup that “the only way to win is to learn faster than anyone else”
Good piece!
My sense is that the market generally prefers CEOs who are predictable, and this much of an about face is neither a good look nor particularly encouraging wrt leadership.
I'm intrigued by the counterfactual : if he had claimed it was an experiment, would he have gotten as much press? I would be willing to bet yes!
"If you are shipping features without running a *controlled* experiment, you are really running an *inefficient* experiment, where analysis is done by looking at a time-series graph"
-- https://www.linkedin.com/posts/ronnyk_abtest-featureflags-experimentguide-activity-6897828685114675201-fJyX