What Makes a Best-in-Class Recommender?

Jim Griffin February 2, 2026

Over the last several years, the evidence has become increasingly clear that personalization platforms that are able to offer high-quality recommendations will outperform ones that can’t, especially for returning customers and for large catalogs.

For example, a study published in Information Systems Research found that the presence of personalized recommendations increased consumers’ propensity to buy by 12.4% for an online retailer.¹

In another study, a large e-commerce marketplace randomly varied its recommendation quality by turning off personalization for a subset of users. As a result, the measured “Gross Merchandise Value” was 5.3x higher for those who were saw high-quality recommendations as compared to those who only saw popular or trending products.²

And a field experiment in the apparel and home goods sector found that relevant recommendations resulted in a statistically-significant increase in purchase probability vs. the control group.³

In short, a high-quality product recommender is a must-have capability for personalization software today.

So why then do some personalization engines fail to deliver this?

The answer is: Because it’s not so easy.

A Wide Array of Approaches

The difficulty that legacy personalization systems face in doing this begins with the wide array of approaches that are available for making a good recommendation. Also, some of those methods can (and do) work better in some situations, while other methods work better under other conditions, so we actually do need simultaneous access to a wide array of models.

Some of the better-known methods for recommending products or services include User-Based Collaborative Filtering (UBCF for short), which recommends items liked by similar users. A related approach is Item-Based Collaborative Filtering (IBCF for short), which recommends items similar to ones previously chosen by that user. (There are also more complex methods for doing all this, using vectors.)

Other popular methods include:

Feature-Based Similarity (Using product attributes like category, price, brand or specifications to find suggestions.)

Image-Based Similarity (Very useful for apparel or home décor – especially when new products or seasonal updates are launched within long-tail catalogs.)

Popularity or trend data (Usually time-bound, such as within the last 7-30 days. Sometimes with various filters applied, by geography, channel or demographics.)

Propensity models (Using inputs like Recency-Frequency-Monetary data, browsing behavior, lifecycle stage, or context such as time, device or channel)

Business Rules (driven by factors like overstock, seasonal items, promotions or vendor priorities). Or insight-driven rules (like “Customers who buy X often need Y”).

Sequence Models (to predict “what happens next” – especially for short-term intent within a single session, or when user data is anonymous).

Needless to say, there are many nuances to how each of these methods might be implemented from a technical standpoint. And there are other methods in addition to these. But even with just what we have described already, you can already imagine that this would be quite a big challenge for a legacy personalization system that was never designed for this type of task in the first place.

The key point is that no single recommender model is best for every situation, which is why a best-in-class recommender system will certainly deploy at least 10 or more different recommender models under the hood.

By the way, you might ask: If there are 10 or more recommendation engines all running at the same time, how does the overall system select which one to use for a particular customer at any given moment in time?

One method for this is rank aggregation. Basically, each recommender model outputs a scored list of recommended products. The idea is that, if many models keep including the same item among the Top 5 or Top 10 suggestions, for example, that’s likely a fairly robust suggestion to put forward to the user.

Alternatively, each model can assign a score to the confidence it has in its own suggestions, so the models themselves can be ranked, rather than the product suggestions. There can also be rules in some cases, based on context. For example: “If this is new user, then use product popularity.” Or there can be combinations of these methods.

Thus, for the top tier of personalization software (the tier that has enabled about a dozen or so recommender models), the performance difference between competing systems will come down to: (1) how many recommenders are enabled, (2) which ones, (3) how they were implemented and, above all, (4) how the top-level orchestration layer was designed.

Let’s bring this to life with an example.

A well-known player in the AI-first category of personalization software is SOLUS. As we would expect, this software runs a diverse portfolio of 10-15 different behavioral, content-based, predictive, and rules-driven recommenders in parallel, and it continuously learns which ones work best for each customer, in each context.

This software even takes all this one step further – dynamically factoring in data such as, in a restaurant context, for example: “It’s very hot today, so it’s good to suggest a cold dessert treat with the meal this time.”

And as we saw from the research mentioned at the outset, this level of relevance creates a much better customer experience, while also creating significant value for the brand.

One thing we can say for sure: In today’s world, a personalization system that has no recommender engine at all (or one that can only recommend best-selling or trending products), will certainly be far behind the leaders in the race, creating a far less satisfying user experience – and failing to deliver the full lifetime customer value that would otherwise have been possible for the brand.

If you’d like to explore what a scaled implementation might look like in your context (or to pressure-test whether the results seen elsewhere are realistic for your business), a great next step can be a focused conversation grounded on your data, your goals, and what you’ve already learned from previous tests.

If you’d like to explore this topic in deeper detail, feel free to set up a meeting at this link.

Sources

¹ https://pubsonline.informs.org/doi/10.1287/isre.2021.1074 (There were also basket-related effects.)

² https://questromworld.bu.edu/platformstrategy/wp-content/uploads/sites/49/2023/06/PlatStrat2023_paper_15.pdf, Page 13, Note 9 (1.363÷0.257 = 5.304)

³ “How Do Product Recommendations Help Consumers Search Products? Evidence from a Field Experiment,” Wan, Kumar, Lee. https://site.warrington.ufl.edu/kumar/files/2023/07/How-do-recomemndations-help-consumer-search-products.pdf

Author

Jim Griffin is a faculty member at the University of Texas, Austin, in the Masters of Business Analytics program. He’s also the founder of AI Master Group, which delivers high-impact consulting and resources related to AI. Jim has more than 15 years of project experience in North America, Europe, the Caribbean and Asia Pacific, with projects involving AI, analytics, machine learning and CRM. He also has a popular YouTube channel and podcast devoted to AI.

Jim can be reached at jim@aimast.org

How It Works

Inside a “Model Zoo”: How Propensity Models Ship So Fast

Jim Griffin December 22, 2025

One of the claims that we sometimes see in regard to modern personalization systems is that they offer a “Model Zoo” of ready-to-use models, and that they can rapidly build […]