The first attempt was intuitive: show less repeated content. But when the ML team reduced repetition across the board, GMV dropped. Leadership got cautious, and the team was stuck — they couldn't ignore the complaints, but the only solution they'd tried made things worse.
I was brought in to figure out what was actually going on. Everyone was asking "how do we reduce repetition?" I had a feeling that wasn't the right question.
When someone says "I keep seeing the same stuff," they might mean the content is actually identical. Or they might mean something subtler: the content feels irrelevant, and irrelevant content is more noticeable when it recurs.
So based on initial conversations with users, I reframed the question: does the experience of repetitiveness depend on how relevant the content is? If yes, the solution isn't less repetition — it's better relevance.
I surveyed 5,000 TikTok Shop users, stratified by usage frequency. Each participant rated three videos that varied systematically in relevance — one from a category they'd actively searched for, one they'd engaged with but never searched, and one they'd never interacted with.
This isolated the effect of relevance on perceived repetitiveness while controlling for actual exposure. The stakes warranted a large sample and a proper regression model rather than something lighter.
Content that matched a user's search intent was rated as significantly less repetitive — even when shown more often. Content from categories they'd never engaged with felt the most repetitive, even at lower exposure.
Repetition isn't really about seeing the same content. It's about seeing content that feels irrelevant — and irrelevant content becomes more annoying the more it shows up.
"Relevance matters" isn't something an ML engineer can implement. I ran a secondary analysis ranking which signals (identified through the qualitative work) best predict whether a user will experience content as repetitive.
This became the weighting structure for new diversity controls. Instead of suppressing all repeated content, the algorithm would check whether it matched the user's inferred intent — and only intervene when it didn't.
An A/B test with tens of thousands of users validated the approach: a low single-digit GMV lift (the previous attempt had decreased GMV), a measurable drop in repetition complaints, and the lowest complaint rates for high-relevance content. The recommendation algorithm was permanently updated.