A couple member comments last week compelled me to reinvestigate the value of BPI and KenPom in predicting tourney outcomes. This is an update to a post I submitted about a year ago.
Last year, ESPN devised a new system for rating college basketball teams. They touted it as an improvement over the RPI system the Selection Committee uses to help determine seeding and the KenPom efficiency rankings, which remove game pace from the equation. BPI was devised by Dean Oliver, a hoops numbers legend who, along with Pomeroy, pioneered the new methodology for basketball statistics. Oliver argues that his system is more complete than KenPom because it:
- Factors in diminishing returns for blowouts
- Considers all wins better than losses
- De-weights games with missing key players.
Fair enough—though it’s important to understand that Pomeroy addressed the perceived shortcoming in over-crediting blowouts. I can’t say whether BPI has a better algorithm for rating teams or not. What really interested me about the explanation of BPI is that Oliver claimed it was 74.4% accurate in picking the tourney games from 2007 to 2012. I don’t know why they cut off the analysis at six years. Maybe it has something to do with the fact that 2006 was such a bloodbath. Remember George Mason? Oliver and ESPN also said that they couldn’t compare BPI to KenPom because they didn’t have Ken’s pre-tourney data.
Well, I do. So I decided to do a comparison. I stacked BPI up against KenPom and a baseline strategy of picking high seeds, with average margin breaking seed ties. This strategy is heretofore called “YourMom” since anyone can pick games by seed. (I didn’t analyze RPI because it’s a notoriously poor predictor of tourney outcomes.)
The first thing I realized was that Oliver and company were evaluating tourney outcomes round by round. In other words, that 74.4% does not measure the accuracy of filling out your whole bracket in advance of the dance. It takes the outcome of the first round then re-evaluates the teams. That’s a clever way of claiming a higher accuracy rate. Here’s what I mean: last year, had you filled out your bracket completely, you probably would’ve hade #3 New Mexico beating #14 Harvard, then possibly downing either #6 Arizona or #11 Belmont in round two. Of course, you would’ve been wrong in both games. But the way ESPN is calculating the accuracy of BPI, they look at the second-round Arizona/Harvard matchup and credit their system for getting that game right. Unfortunately, for the typical tourney pool, where you fill out your bracket before the dance and let it ride, you don’t have this luxury of round-by-round reassessment. That alone inflates the BPI accuracy rate.
All that said, I did look at how BPI compared to KenPom and the higher seed strategy in round-by-round game prediction. The first thing I realized is that ESPN factored the play-in games of 2007-10 and the First Four games of 2011-12. Otherwise, there was no way that the percentages would’ve rounded to 74.4%. So, they’re evaluating their system against 390 game decisions—and saying they got 290 correct. I don’t know how their system sorted out the pre-tourney sham games. So let’s assume they were nine for 12 in those matchups. That means for the 378 real tourney games played between 2007 and 2012, BPI must’ve forecasted 281 games correctly. And if you factor in its 2013 performance, where the system went 41 for 63 in picking games, BPI’s overall record is 322-119, for a 73.0% accuracy rate.
So…the question is: how did KenPom and YourMom perform over the same time period? The answer is: not as well…but not significantly worse. Both KenPom and YourMom got 319 games right. That works out to a 72.3% accuracy rate—a scant 0.7% below BPI. That’s right: the high-falutin’ BPI and KenPom systems yield prediction results that are just 0.7% better than a tourney ninny advancing all the higher seeds. Ouch.
I’m sure, in some way, this could be spun as a small victory for BPI. But let’s remember one thing. Last year was the first time the system could be used in a true predictive situation; before that, it was applied to tourneys retrospectively. And last year’s 65.1% accuracy rate was well below the 74.4% originally touted. Regardless, a three-game advantage over seven tourneys isn’t anything to write home about—particularly when it’s so close to the strategy that any bracket newbie could adopt.
Okay. So I did one other analysis—and I think it’s more relevant than determining round-by-round prediction accuracy. This analysis compared the accuracy of KenPom to YourMom in filling out your entire bracket and living with the consequences of lost games in previous rounds. I was able to go back ten years for this analysis. What did I find? Using KenPom efficiency data would’ve predicted 414 of the 630 tourney games played between 2004 and 2013. That’s a 65.7% accuracy rate. And how would’ve YourMom done? Amazingly, one game better—for a 65.9% accuracy rate. That’s right. Picking by seeds and margin edges out using KenPom.
I’m not here to question KenPom. I think his data is absolutely invaluable to analyzing the quality of basketball teams. I crutch on it, as do nearly all the analysts these days. But what this shows is that no system can reliably predict basketball outcomes. We all try, of course—and I would argue that all these systems underrate the value of things like coaching, momentum and individual matchups. But at the end of the day, the NCAA tournament is a giant puzzle of probabilities…and the people controlling the outcomes are 20-year old kids.
So when it comes time to fill out your bracket, cast a cold eye on all these statistical methods. Even mine. Take from them what you believe, use your gut instincts…and create a bracket that’s your own. Either that, or call your Mom.