Rethinking the Heya Power Rankings

It’s been a bit since we published the heya power rankings.

This feature started off several years ago when I – and others! – were interested in the idea of ranking the performance of a stable.

Now, to be clear, this concept is not central to the idea of sumo at all. Heya do not compete with each other. Nor do they probably want to be told they are better or worse than another stable: they all have their own mostly distinct cultures, histories, chanko recipes, traditions, personalities, etc. While I also wanted to see performance of the various ichimon (the groups of stables organised largely for administrative and also historical purposes with many cross-stable relationships, training partnerships and links among oyakata and elder names), the idea was that perhaps training partnerships could show a correlation of performance over time and a rise in performance of associated stables under certain leadership.

It’s kind of an interesting concept, but I think the presentation was a bit ham fisted and while it was a good thought exercise, I’m sure there were plenty of number crunching people dwelling in the dark recesses of the sweat stain encrusted corners of the mawashi that is global sumo internet fandom slamming their faces into keyboards at the idea of measuring this kind of thing based off sekitori kachi koshi and various prizes.

Admittedly, when you follow the sport more on a personal level and also over a longer period of time you start to understand nuances that appear: Scouting partnerships, relationships, connections to the amateur world, details of the specific oyakata and so on.

For me the most problematic thing, if you look at the old model, is how we would have handled a stable like Michinoku: having a yokozuna and a high ranking maegashira, it would have scored high in 2020. But the reality is that the perma-kyujo yokozuna was transferred there against his interest, and apart from Kiribayama, almost everyone the stable has put into the salaried ranks over 20+ years have been inherited from other stables (and number of those guys were even bounced in the yaocho scandal, truncating their sumo careers significantly). So our old model would have given very inaccurate portrayals of a stable like that, and its development relative to the rest of the sumo world.

The series also spilled a lot of words on these pages about the supposed “fall” of Isegahama beya, as our model showed its numbers going ever lower, to the depths of some stables like (for example) Isenoumi. Again this is a misrepresentation. While Harumafuji’s retirement, Terunofuji’s injury driven fall out of the top ranks and Aminishiki’s intai certainly impacted the stable’s impression on the sport in the short term, this model did not take into account the incredible stream of talent coming up toward the top two divisions while this was occurring. Nishikifuji and Midorifuji have since impacted the top two divisions while Terutsuyoshi has turned himself into a makuuchi regular and even handed his stablemate Terunofuji an assist in his improbable yusho on his makuuchi comeback. This all while Takarafuji continues to be a solid fixture at the business end of the top division. There’s more that happened there in the fallow period following Harumafuji’s retirement than has happened in the total of Michinoku-beya’s 22 years, but our model won’t have seen it that way.

So the question is: how to measure the success of a heya on an ongoing basis? Is it a body of work that can only be measured when an oyakata retires, like a ramen chef who has spent a lifetime perfecting the craft? Or is it fair that like a baseball farm system, we can identify, analyse and grade new recruits and their potential impact on the top end of the sport? Similarly, while the banzuke shifts largely on numbers alone (apart from some strange whims of a group of old men and the interference of a pandemic), the performance and projection of recruits needs context: a 7-0 in Jonokuchi is more impressive from a fresh 17 year old than it is from a 23-year old with university sumo pedigree. Numbers might be able to project a sekitori (as some folks on the Sumo Forum exhibited years ago), but the eye test and other factors are probably required to determine the quality that can lead to improved performance across the board.

I think the answer is somewhere in the middle. But unlike those who find it a pointless exercise, I do think there is value in doing the analysis. There just needs to be a better way. I’d love to hear some thoughts from the community. To what extent should data factor into this? Should we be taking more advantage of Andy’s data visualisation tools (trick question, the answer is yes)? Should it contain large amounts of subjectivity from experts with potentially differing opinions, like farm team rankings? Let us know what you think.

20 thoughts on “Rethinking the Heya Power Rankings

  1. I enjoy the statistical aspect of Tachiai and think you should definitely invest some time into trying to perfect the heya power comparisons in tandem with Andy’s excellent visualizations. The old method was, as you say, probably a little too simple and you may need to be thinking of adjustments for whether rikishi were developed by a different stable and also some allowance for recent success over a fixed period of time as well as the current rikishi ranks. As to this last point, UEFA take account of success over the last 5 years when working out team rankings and seedings for European Soccer.
    They also have a 10 year ranking of clubs, which isn’t used for seedings and adds in more factors such as bonuses (“title points”) for actually winning trophies (which after all is the main objective of the sport, not just winning matches!). this might give you some more ideas –
    Good luck!

    • Cool thoughts. I’m very familiar with the UEFA coefficient, so that’s an excellent comparison. I like the concept of some kind of weighted average that doesn’t simply reward based on sheer volume. In football the challenge is that, for example, Liverpool and Manchester City compete in all of the same competitions and are both subject to a 25 man squad, so (finances aside, though you can make the argument some better supported sumo stables have an advantage) there’s an even playing field on which to form a comparison about the development of their players and their success.

      In sumo, you might have a stable with 12 rikishi that does a better job of development relative to a stable with 30 rikishi. In fact I’ve even heard mention of larger stables as tsukebito farms, and multiple folks have suggested to me certain oyakata are happy for numbers because of the financial upside of doing taking anyone in, whether or not they’ll amount to much. For me, that’s where it gets murky.

      Let’s say two oyakata start a stable tomorrow. One raises 4 deshi, one of whom reaches Juryo in 2 years, and the other three are all in Sandanme and Makushita. The other raises 15 deshi, three of whom reach Juryo in the same time frame and are performing very well, but the other twelve are no higher than Jonidan. The first oyakata has a 25% sekitori hit rate compared to 20% from the other oyakata and a better overall record of development, but it could be argued the second oyakata has stlll influenced the sport more. So getting the averages right while also rewarding volume appropriately is going to take some touch.

      • I wonder if the average “speed” of wrestlers’ ascent up the banzuke would be a worthwhile indicator? Several of the Naruto recruits have jumped up the banzuke, even though they don’t have sekitori yet. Maybe also accounting for the age of the stable or tenure of the oyakata? I’d not thought of that before your article but it would be interesting to track Takanohana recruits and see the comparison with former-Chiganoura…

        • The story of the Takas is “you’re sekitori, or you’re involved in scandal, or both”. The only remaining Taka who has no sekitori promotion prospects is Takataisho. Takakento is going to make sekitori in Haru. The others were either victims or perpetrators of abuse, and in one case, both.

  2. I suppose a useful paradigm might come from teacher evaluations here in the States used in many school districts. When they want to try and gauge teacher effectiveness, many times they will look at student growth during the period they were with that teacher. Not just how well their class measured up against another teacher’s class, but how well that teacher was able to move that student forward in their individual progress. I suspect a large indicator of a heya’s health would be some measurement of how each rikishi develops over time. How much, how fast, whether or not it was diminished by injuries, etc., can all factor in.

    • That’s an excellent idea. Kirameki’s progress has slowed while Hokuseiho and others from his class are much closer to reaching sekitori status.

      • There’s the rub, though. Kirameki’s progress hasn’t slowed, his ranking has simply caught up to the talent level he came into professional sumo with. Up to this point in time it’s frankly impossible to say how much progress – in terms of actual improvement of skill – he has made at all. And that same line of thought can and should be applied to any new recruit for their first year or so. If you run an analysis on that, what you’re going to find is recruiting prowess, not teaching aptitude.

        And that’s before we get into the whole question of how much of a rikishi’s improvement over time should actually be credited to his stable and/or his shisho. With the extreme assumption that it’s all down to each rikishi’s individual characteristics (genetics, work ethic, avoidance of injuries, etc.), we would conclude that a shisho only matters for two things: Recruiting the best talents, and creating an environment in which those talents don’t have a reason to mentally check out or even quit.

        Now, I don’t believe that’s the case, but I feel it’s extremely difficult, if not impossible, to determine how far away from that baseline the real effects actually are. The comparison to teaching evaluations doesn’t really work, IMHO – there’s no set curriculum, and the confounding factors (primarily base talent and injuries) are huge.

        Shikihide-oyakata could be the best instructor in sumo, but as long as he primarily recruits deshi who are simply much worse at sumo than those recruited by other stables, it’s not going to matter. A bad student improving his knowledge can be (more or less) objectively measured, a bad sumotori improving his skills will still largely be held back in his performance by those who were better to start with.

      • And to add, by “sumotori improving his skills” I mean improvement relative to his cohort, not simply improvement over time. Almost every rikishi (providing he stays long enough) will be better at age 25 than he was at age 20, when in turn he was better than he had been at age 15. Looking at banzuke ranks and basho results will primarily measure that kind of age/experience-related improvement, which isn’t very meaningful.

        By extension, that means that a lot of what might be seen as a heya’s contribution to rikishi development is simply a matter of how old its current roster skews. And that, in turn, is again primarily just a function of its recruiting activity. (This time in terms of pure numbers, not talent recognition.)

  3. Being honest, I did think that the prose parts of your old power rankings articles were always much more interesting than the somewhat arbitrarily calculated numbers. But on the other hand, without regularly updated numbers there’s arguably no “hook” that requires writing about this topic more often than once a year or so; the fortunes of individual stables just don’t wax and wane all that quickly. So, at the risk of being too blunt: What’s the actual primary goal of this rethinking process, finding a better analytical system or finding a better hook for future articles?

    • Not too blunt at all. A “hook” for future articles would only be interesting if it actually means something and represents something of value. If it’s something worth writing about, to me it’s pretty interesting. In the context of sumo, though, the heya concept itself is something that just begs for some means of objective comparison. But, as Josh eluded to, the complexities of the “oyakata lifecycle” are interesting wrinkles that will make a comprehensive analysis challenging.

    • I appreciate the kind and honest words there regarding the content, all good.

      I can only speak for myself but I haven’t been as much of a regular contributor over the past year or so, so I would say it’s not especially for me about attracting engagement or page views as much as creating an interesting dialogue or series, something that is of personal interest to follow. I haven’t been posting about it, but also I’ve been paying far more attention to the topic than when I was writing about it. So I would say that the primary goal is to actually contribute a meaningful analytical system for grading the progress and development of stables, that’s backed to whatever extent it can be contextually by any intangibles we know as people who follow (from whatever distance) the sumo world.

      In terms of cadence of posts, I had been doing the power rankings after every basho, but as we both have acknowledged, the calculations were a bit elementary and it’s hard to continue the analysis if after 18 months of something you realise actually you don’t have a lot of real meat on the bones. So I’ve been toying with the rethinking process for a while. I have some ideas but more often than not, someone will throw something into the mix I might not have thought about.

      In terms of how often the calculations are run, that’s another answer I don’t totally know. Is it every basho (probably)? Is it annually? Is it two or three times a year? I wouldn’t be opposed to only doing something really detailed and polished once a year, though given how quickly things can change I do wonder if that’s not enough. Maybe a January/July thing is the answer.

      • I’m strictly a “tracking individual rikishi” guy myself, but beyond the pure results tracking of prospects in makushita, which I used to update each basho (and which I really ought to get back to as well), I also used to maintain an informal private listing of more or less promising rikishi who had yet to progress beyond sandanme. I was updating that listing with rikishi additions and removals twice a year, after Natsu and Kyushu. FWIW, that felt like a reasonable amount of time for new information to roll in, so that I wasn’t constantly going over barely changed data.

        So perhaps that’s a cycle that would work for heya reviews, too. Maybe with different timing, though. After Haru and Aki may work best due to the Haru-focused recruiting efforts. Post-Haru would get to include the year’s recruiting efforts, while post-Aki would allow for covering the maximum three tournament appearances of those recruits.

  4. First off, I do like the statistical look and the graphical data presentation. Thanks for the efforts.

    I would propose that the top heyas in the power ranking would be those that recruit AND develop well. The problem is, as you say, teasing the two apart from the existing statistics. Other sports have the same problems, so perhaps there are some examples.

    Baseball SABRmetrics are just very granular stats, so no help there.

    There is a lot of debate about whether Pro Football Focus provides useful information or not. I like the philosophy of grading the plays and not the result though. There are often too many variables in the results – did the defender drop an easy interception? The final stats do not show the bad quarterback decision since there was no negative result. PFF’s method captures this. The penalty for this is that each play has to be graded instead of looking at the final stats sheet.

    Applying a similar method to sumo bouts is intriguing. Grade each rikishi on each bout on a range from -3 to +3. A -3 represents a loss resulting from poor execution, strategy, and/or technique. A +3 score represents a win from a strong performance. This allows for scores like a +2 or + 2.5 for a loss in spite of quality sumo. Think about the day 11 Asanoyama/Takanosho bout – quality sumo by both, but only one winner in the final stats. Similarly a -2 can result from a win through a combination of poor sumo and good luck. Pick your favorite Tokushoryu win from Hatsu 2020 for an example. Total the scores of the 15 days of each basho.

    Over time it would be straightforward to tell if a rikishi is making real progress in spite of external factors. One could argue that Hakuho has been the major external factor that warps the statistics at the top of makuuchi. A system like this would help compensate. (For our European friends, how many Tours de France would Raymond Poulidor won if he hadn’t had the bad luck of competing against Eddy Merckx and Jacques Anquetil?)

    Looking at the average rate of change of all the rishiki in the heya would reveal which heyas were most successful at developing rishiki regardless of size of the heya, how they came to the heya, or what division they are competing in. Looking at the average banzuke positions of the top three rishiki in any heya would reveal which delivered the best results and partially compensate for heya size . Having both pieces of information would reveal the top tier heyas – good at both recruiting and developing, the second tier heyas – good at one or the other, and the third tier or “look at that, kabu!” heyas.

    Alas, I’m not going to score each bout. Even if I were, I doubt my scoring would reflect sufficient depth of knowledge to provide value. I guess for now that means it’s just a though experiment unless someone has an idea of how to extract different information from the existing data set.

  5. As a sumo outsider and ardent fan for just 2 years, I realize now that I know next to nothing about sumo. However, it does seem to me that promotions/demotions are more subjective than data driven; I always imagined a carefully constructed algorithm that took into account not only net wins/losses but also quality of the opposition, sophistication of the kimarite used, decisiveness of the victory/loss, length of bout, etc etc. But… instead it appears to me now to be more subjective than anything else. For instance, In Sept 2020, of Takayasu’s 10 wins, only 5 were against higher ranked opponents, and only 1 of those in Sanyaku (Mitakeumi, who was a Sekiwake at the time.) Yet, Takayasu was promoted from M6E to KW, a jump of 5-1/2 ranks.

    Yet in November, Hokutofuji scored 11 wins, of which 4 were against higher ranked opponents (one of whom was Takayasu at KW), yet thereafter he was only promoted from M4E to M1E, a jump of just 3 ranks.

    So one guy wins 10 and jumps 5.5 into Sanyaku. The other wins 11 and jumps 3, staying in Maegeshira, despite even starting from a higher rank. Was this unfair bias, or are there other hidden factors at work in the promotion algorithm that I’m totally missing? With all this said, it seems to me, that, in the absence of any truly solid algorithm governing the promotion/demotion system, any effort to meaningfully rank the heya can only be even more hopeless at best.

      • In other words, there’s something very much like an algorithm that’s used, but since every rank on the banzuke must be occupied, it can only produce a rank order. If there’s no one who “belongs” at M3, like is the case with the upcoming banzuke, you still gotta put someone up there…

        • The divisions below juryo (or at least below around mid-makushita) are arguably even more algorithmic. They basically seem to have a set amount of ranks that a kachikoshi record should rise from any given position, and then the makekoshi will be slotted in around them. If there’s a lot of space to fill due to many retirements, the MK demotions will get very small; if almost nobody retires, they tend to get large.

          Since the vast majority of rikishi are in those divisions, the Association’s re-ranking decisions shouldn’t be an impediment to a heya analysis.

  6. It would seem that the people who would really want to know about this Heya power ranking are young rikishi who are thinking about joining one.

    • Given the live-in requirement of being a sumotori and the lack of transfers, I would say that picking a competitive heya shouldn’t really be the primary goal of a rikishi. Obviously it’s rarely advisable to join moribund stables such as e.g. Kataonami, but as we’ve seen with the Kotokantetsu mess recently, stables generally acclaimed for their rikishi development work (of which Sadogatake is one) may well have other drawbacks.

    • Interestingly enough that is another topic I shall broach on this site before too long.

      There are a lot of elements that come into play however that serve to bring someone into sumo, and in most cases, I think the options appear to be quite small owing to various heyas’ scouting networks, personal relationships and the obligations that come from those, regional/family bias, etc and so on. A lot of those things are incalculable and are subjective at best, when you talk about the personality of the oyakata and amount of other rikishi present in a heya. Maybe the kid was just a fan of the oyakata. So it might be “valuable” to someone who is being talked to by many stables.

      BUT analysing the success of an oyakata relative to their peers seems like an interesting endeavour especially in an era where the Kisenosatos of the world come out and talk about their interests and ambitions, or where Shikihide apparently handles recruitment differently to anyone else, or where a Naruto has opened a new (physical) building with a rush of new seemingly very talented recruits over the past year or so.


This site uses Akismet to reduce spam. Learn how your comment data is processed.