Why Wine Scores Are Broken and What Chess Can Teach Us About Fixing Them
A deep dive into why traditional wine ratings fail, how a chess-inspired Elo system offers a more intuitive alternative, and the app I built to put the idea to the test.
Wine scores, whether they take the form of 100 or 20 point assessments from well known wine critics, or the 5 star system championed by platforms such as Vivino are problematic. They seek to compress what is an incredibly multidimensional sensory experience into a single, supposedly authoritative number. While they will be highly considered assessments, often backed by serious expertise and qualifications, any score compresses thoughts on the wine’s aroma, texture, acidity, tannins, its balance, evolution in the glass, the context in which it is consumed and even the emotional state of the drinker into a single number. None of these variables scale neatly onto a single linear axis, yet that is exactly how these scoring systems treat them. The difference between a critic’s score of 93, 94 or 95, is more likely a reflection of mood, time pressure and unconscious bias than any defined qualitative experience.
As discussed at some length in my last article on the topic, the actual value offered to the consumer by a wine critic often becomes a matter of finding a critic who’s personal tastes align with yours, rather than any objective qualitative assessment offered by the score itself. They work mainly when the critic’s subjective bias matches your subjective taste. A high score from one critic may therefore not be of any value at all to large swathes of the population, but extremely helpful to a tiny minority who have made the necessary connection between critic, score and their own taste preferences.
Crowdsourced platforms of course promise a more democratic alternative, yet introduce a range of distortions of their own. Vivino’s 5-star averages for example, represent a blend of contexts, with bottles opened too young, wines consumed in poor condition, celebratory bias where the event, not the liquid earns five stars, etc. The result is a form of sensory populism that rewards comfort, familiarity and sweetness, while punishing structural nuance, austerity or complexity; all traits that define many of the world’s greatest wines. A 4.2 score on Vivino or a 92 from a famed critic will inevitably be treated as hard data, when in reality it is a fragile numerical veneer originating in the highly subjective, and quite impossible task of accurately quantifying the multifaceted qualities of wine on a linear scale.
It is a task that I suspect is too complex for anyone to perform with any accuracy. Of course, highly trained critics with long experience will be far more methodical in their assessments, evaluating the individual components of a wine’s profile rather than spending 2 seconds debating whether a wine is worthy of 3.5 stars or 4, but even then it is an imperfect process.
The cultural weight of these scores, and what they imply with regard to price and consumer expectations, only serves to magnify their inaccuracies. Joe Fattorini wrote a superb piece around a year ago (which regretfully I no longer have access to), which picked the efficacy of the 5-star scoring systems apart with gusto, likening them to a psychological crutch which leads consumers towards relying on ratings rather than their actual personal preference. No wine scoring system, including that which I am about to discuss, can overcome this problem. Yet re-framing the question that underpins the scores in the first place may enable us to avoid many of the structural failings that plague current scoring systems.
Chess, Mate
If you’ve ever played chess online, you will likely have seen reference to, and likely been rated by, a scoring system known as Elo. Invented by the Hungarian-American physics professor and chess player Arpad Emmerich Elo (1903-1992), the rating system that would come to bear his name would, by 1970, be adopted by FIDE, the World Chess Federation, as the official ranking method for chess players.
The Elo system works by assigning a numerical rating to each player, which changes based on the outcome of games and the ratings of their opponents. When the results of a chess game is evaluated, the expected outcome is calculated based on the difference between the players current scores, representing the probability that one player will win over the other. The points exchanged between players once the outcome of the match is known is proportional to the magnitude of the upset, adjusted for a scaling factor known as K.
The formulas for calculating Elo can be seen in the pictures below.


For example a chess game between me (Elo rating: 1100), and Magnus Carlsen (Elo rating: 2839), fellow Norwegian and strongest chess player on the planet, would have an extremely predictable outcome. He would crush me effortlessly. As a result of the difference in our respective ratings, his Elo score would not budge when he destroys me, nor would my rating be penalised for what is after all a complete no-brainer of an outcome. If however, through some magical intervention, I were to beat him, I would gain 30 points and he would lose 30 points (assuming the K was set to 30).
If Magnus is paired against a much stronger player, such as Hikaru Nakamura (Elo rating: 2813), a win by Magnus would earn him 13.9 points and cost Hikaru the same, while a win by Hikaru (slight underdog) would earn him 16.1 points, costing Magnus the same.
It is a relational points system that adjusts quite elegantly over time, based on the outcome of head to head pairings.
Applying Elo Ratings to Wine
So how does all this talk about chess matter in the context of wine? Well, instead of pretending that a wine can be assigned an absolute, context free value, an Elo-based ranking system applied to wine would recognise that preferences emerge relationally. In other words, wine A is preferred to wine B at a specific tasting moment. When applied over time, the pattern of pairwise outcomes would shape a dynamic map of consumers palates. It sidesteps the impossible expectation that consumers (or indeed critics) can quantify their experience on an arbitrary scale and instead asks a far simpler, and much less cognitively taxing question, namely “Which of these two wines do you prefer right now?”. The mathematics of the Elo formula then transforms this intuitive choice into a continuously updating system reflecting both the strength of past favourites and the impact of new discoveries. Crucially, ranking wines in this manner makes the rating agnostic to style or convention, featuring no inherent reward for richness, oak, power, or sweetness, nor any penalty for delicacy or restraint.
Just how much simpler rating wines as a one to one comparison is over assigning respective scores to wines should be quite evident. Most can with some certainty say which wine they prefer when presented with two glasses. And if for some reason such a choice is impossible, the wines can, much like chess players, draw.
The realisation that Elo might be applied to wine, is, as with anything in this world, not an original idea (though I have only discovered one other mention of it, and that some time after having already committed a fair bit of time towards this article). In 2021, Roald Schuring posted a well researched article on Medium in which he reformates the results of more than a million Vivino star ratings by turning them into a series of zero-sum matches. In this way, two wines reviewed on a single day by a specific individual would be pitted against each other with a 4 star wine beating a 3 star wine, while two 3 star wines would draw. I won’t recount the whole article here, which is quite interesting, but perhaps unsurprisingly it shows a relatively strong link between the star ratings and the resulting Elo scores. However, even here, the Elo score is able to control for several variables that obscures the actual qualities underpinning the various star ratings. For instance, it is able to control for reviewer strictness, and accommodates changing behaviour over time.
While the reformatting of the scores done by Mr Schuring in the above article manages to leverage some of the benefits of an Elo rating, it does nevertheless rely on the underlying qualitative assessment and subsequent score selection of Vivino’s users, and thus inherits many of its problems. In particular, though we use them to compare wines, star scores cannot capture relative performance, but rely instead on the subjective assignment of information into discreet buckets (stars). This is very woolly territory indeed, as the resulting score is as much a reflection of your interpretation of the scale used as it is the quality of the wine.
By pitting wines up against each other directly, asking “is wine A better than wine B?”, only the immediate result matters, removing any emotional friction which might be triggered when assigning too high or too low a rating. As a result, it should also avoid the issue of rating inflation altogether. It would also put a stopper to the concept that any wine can be perfect, with the maximum achievable Elo score for any wine being relative to all other wines it is compared to. Having hundreds of wines over the years rated as 100 point wines, a supposedly perfect score, is a great example of how the limited numerical scale can reach a level of logical absurdity. If you’re in the happy position of doing a tasting of several 100 point wines, will not the fact that preferences or different qualities emerge require these scores to be adjusted accordingly?
The real power of the Elo rating of course lies in the fact that the quality assessment is relative to other wines, rather than being relative to an abstract number, which is much more in line with how we think of, and process the concept of quality as humans.
But This Is All Theoretical, Right?
The reason this article has taken substantially longer than usual, and has been far more challenging to produce, is not because of the complexity of the above topic, nor really the depth of research. Rather, it has been because I wanted to put an Elo based rating system to the test.
This has involved the creation of an app, purpose built to record and track head-to-head matchups between wines, ranking their relative performance based on their resulting Elo scores. Now bear in mind, I am no developer, don’t really know how to code, and have never made an app before, so the learning curve was steep, but thanks to modern AI assisted coding tools it has been possible to cobble together something I think represents a good start. I’ve called it The Wine Ledger, which works, but with hindsight could have done with some workshopping.
It is a nifty little app if I may say so myself, if not without its foibles, that I hope can be of some aid in tracking a users preferences and tasting history, while of course rating wines in the process. Another feature of this app, which I think is as interesting as the resulting Elo score, is that it is possible to see the match history of any given wine. Rather than simply relying on the numerical score and its relative position among other wines listed, you can see exactly what head-to-head comparisons have been made which resulted in that wine’s Elo rating. This is a form of insight and transparency that is completely absent from current wine scores. Each individual assessment is of course highly subjective, but the aggregate is interesting nevertheless, allowing anyone curious enough to in theory attempt to replicate the results for any given wine. For a winemaker I can also imagine it would be of interest to see such direct comparisons, made by users actually drinking and enjoying their wines, and it might lead to some unexpected outcomes.



The criticism levied against Vivino ratings, in as much as wines may be rated in suboptimal conditions or be unduly influenced by the setting in which they are drunk is not something this app can get away from either. Nor does it really allow you to compare all wines against each other at once, as it pits your current favourite against whatever wine you’re drinking next. With enough matches however, the results ought still to be representative of a wines relative quality.
Also, when a new vintage is added of an existing wine, it will inherit the Elo score of the most recent vintage in the system. A choice made to accommodate the past performance of a winery, rather than re-setting each vintage to the starting Elo of 1200.
My expectations for the app are quite low, but as the worst case scenario is that I have an app that ranks my own personal preferences, I figured I’d put it out there for you to play with as well. It is meant as a slightly elaborate proof of concept, and represents my thoughts on how wine apps can leverage what I think is a more intuitive rating system to provide a better service. I earn nothing from it, and it’s free to download should you want to. The home page is, according to a highly experienced UX developer I showed the app to over the weekend, very dry and lacking in general appeal, yet it should contain enough info on how to use the app to get you going.
If you do encounter any issues with it, or have questions about how to use the app, please let me know and I’ll do my best to address them. Also, I’m afraid it is only available on the iPhone at the moment, and, unless there is a completely unexpected demand for the app, which I doubt, it will remain so.
I can imagine you will have thoughts on this topic, and I would love to hear them all.




Unfortunately, like Simon, I'm an Android user 😕 But it sounds fun and an interesting way to rate wines - good luck with the app!
I guess the app is iPhone only.... would love to have given it a test drive!