Almost nothing brings a smile to winter-weary baseball fans than the reporting of pitchers and catchers to camp, followed by position players, then the first crack of the bat for a Spring Training game. We know it’s almost really here, actual baseball cannot be far off. And nothing accompanies actual visual contact of the team than making predictions about how they will do. Here at Mets360, we are in the middle of pulling apart player predictions for 2020, which is always a fascinating exercise. Outside our space, one of the main projection systems that everyone waits to hear the results of is PECOTA – the proprietary forecaster from Baseball Prospectus.
This year the PECOTA (Player Empirical Comparison and Optimization Test Algorithm)
standings made quite a splash for the National League East with our beloved Orange and Blue sitting right at the top, with a projection of 87.9 wins. Everyone runs to the standings to see who “finishes first” and makes big proclamations from it. No doubt, the projection is a big deal because of the effort that goes into building a very sophisticated algorithm, which lends an air of reality to the outcome. Every fan wants their team to finish atop the standings when they are released. Like most projection systems, PECOTA is a mathematical algorithm using fancy number games to run thousands of outcomes simulations, a fact should remind us that this is not reality. Baseball Prospectus even makes a funny about the whole thing by stating: “If you rub a crystal ball it will show you the future. If the ball is cracked you might cut yourself. Injury is more likely than insight.”
One thing I liked a lot about the 2020 PECOTA projection is that it also came with plots showing all the distribution of all simulations. You can find these at the link above and clicking the little graph icon at the top of the standings order. A cool thing off the bat is the probability distributions for the Mets, Nationals, and Braves are roughly bell-shaped (normal distribution), making possible some statistics worth considering. Armed with the plots but not the actual data, I compared them with a normal distribution plot just to confirm the potential viability of what follows.
The projection ended up with 87.9 wins for the Mets, which represents the mean (average) value for the simulations, not an actual number of wins; similarly all team projected wins are the mean of the simulation runs. Mean values of a population can be quite misleading. For example, the mean of the population of 10 and 0 is 5, which is the same as the mean for the population of 4 and 6. The populations are quite different however because the spread of data between populations is very different. That spread can be quantified as standard deviation. In terms of the projected standings, this translates into some number on either side of the mean to account for a certain amount of the variation. For the top three teams in the East, the number is visually estimated as 6 for one standard deviation (explains 68.4% of data). We should therefore see the Mets wins as a range from 82-94, not simply the mean. There’s a good spread on the simulation runs. It turns out, the Nationals, Braves, and Phillies have similar-ish distributions, so the +/- 6 number pretty much applies to their simulation runs too.
The +/- 6 value for each team is noted on the modified figure from Baseball Prospectus with a black bar along the bottom of each team’s plot. Shading the area (gray box) where the one standard deviation range overlaps for each of the top three teams gives a more realistic perspective just how close the PECOTA projections are for the East, and how easily the outcome could be quite different. The Mets, Nationals, and Braves all overlap between 82 and 89 wins, indicating that any of the three finishing first is not out of statistical bounds. Although, those numbers are on the “pessimistic” side of the mean for the Mets and Nationals and mostly (but not entirely) on the “optimistic” side for the Braves, when injuries, surprises, and crazy luck get rolled in, the division could easily end up with any of the three in first.
The NL East looks like a very tight race based on PECOTA simulations. Every win will matter, especially within the Division. Perhaps that not a surprise to anyone, but worth noting that a 6-8 week soft spot could spell disaster for any of the three top teams. Surprisingly enough, even the Phillies have a remote chance to place pretty high in the standings if everything went crazy in the division, or you expand the statistics to be two standard deviations (pretty common to do), which would account for about 95% of the spread in the simulations. Whoever wins the East will be vetted, combat-ready, and highly primed for October baseball – here’s to hoping it’s the Mets on top!
My opinion is that before this forecast came out, if you interviewed 100 Mets fans – all 100 would have expected the NL East to be a dogfight. Furthermore, all 100 would have said that they liked the Mets’ chances if it was a fair fight, if all teams had somewhat equal luck and health.
To me, PECOTA is taking that belief and expressing it in number form.
I think one thing worth making clear is that the value they put in the win column for the standings is unrelated to a prediction of actual wins. It’s great to see their data plots!
The math computations are both interesting and cool, and we know they are limited as well. The June swoon may be the Mets’ biggest risk factory. Hopefully, we don’t see it in June or any other month.
Good point about the need to keep all your bullets, therefore making a Matz trade unrealistic or unwise. Which probably means, BVW will make it.