Top categories: a probabilistic approach

Here is a common scenario. A sequence of article headlines is shown to a user. The articles have different categories. The user reads some of the articles. Which categories does he like? Which categories does he dislike? Probability to the rescue…

Consider the example of an app that shows 10 headlines. Each article has a category: news, sports or showbiz.

Consider a user, let’s call him Bob, who reads 3 articles per session (on average). For Bob, each article has a (3 / 10) = 30% chance of being read. This is the “global” average.

We can calculate the % for each category. Let’s say that from the last 10 articles…

3 news articles were suggested, 1 was read. Success rate = (1 / 3) = 33%
3 sport articles were suggested, 2 were read. Success rate = (2 / 3) = 67%
4 showbiz articles were suggested, 0 were read. Success rate = (0 / 4) = 0%

We can infer that Bob likes sport and news but dislikes showbiz. Any category with a success rate greater or equal to the “global” average (30%) is a success. (The threshold for dislike is best determined by experiment).

Obviously, the results get more accurate as the history increases.

We could show Bob more news and sport, and show him less showbiz. Will this skew the results? No, the probabilistic approach is robust with respect to self-fulfilling prophecies. It will self-correct quickly if Bob occasionally gets shown an article from a “cold” category.

Leave a Reply

%d bloggers like this: