Abraham Lincoln once said, “It’s better to remain silent and be thought a fool, then to speak and remove all doubt.”
Apparently nobody ever told that to Mike Puma of the New York Post. Not only has Puma gotten himself in trouble this year for an unprofessional and unfunny joke about Mets starting pitcher Bartolo Colon’s weight, but after the Mets’ 3-1 loss to the Atlanta Braves Wednesday night, he unleashed this golden nugget on Twitter, spurred on by comments made earlier by Sandy Alderson to Jon Heyman:
Mets are three outs away from falling a season’s worst 11 games under .500 … But at least their runs differential doesn’t stink.
— Mike Puma (@NYPost_Mets) July 3, 2014
I am in no way suggesting that Puma was the only Met beat writer to make a joke about that comment, because he wasn’t, but let’s really examine the level of ignorance contained in that tweet.
The Mets’ record now stands at 37-48, but their run differential is not too shabby. They have scored 328 runs, and allowed 334. That works out to a Pythagorean record of 42-43, just one game below the .500 mark, rather than 11.
There is a very strong correlation between run differential and win-loss record (see the chart to the side), but as anyone who has taken an intro-level statistics class could tell you, there are sometimes aberrations – outliers, if you will – in the data that sometimes break from the prevailing trend.
A situation like that is the exception, not a reason to discard legitimate statistical findings.One of the best examples of a team breaking the trend was the 2007 Arizona Diamondbacks. That team went 90-72, winning the NL West despite a run differential of -20, and a 79-83 Pythagorean record.
So for Puma, or whoever wants to dangle run differential out there to take cheap shots at sabermetrics, I genuinely feel sad for you.
Somehow, you’ve managed to completely miss the boat on what sabermetrics – and being a fan of baseball – is all about.
“Sabermetrics don’t have anything to do with statistics,” said Bill James in an email interview for this piece from December. “Sabermetrics is the search for objective knowledge about baseball.”
Statistics assist in that search for more objective knowledge about baseball. They help us know the exact run expectancy value of a runner on first and no outs, versus a runner on second and one out, so that we can guide strategy, knowing that the sacrifice bunt – in almost all situations – is a negative play.
We can look at a player like Mike Trout and put a number on his overall dominance of the game – and compare him easily to other players both in the game today and in history.
And yes, we can look at the underlying numbers of a team, and say that they are perhaps getting a bit unlucky, and are due to start playing well as the statistics regress to the mean, and the “luck factor” wears off as the sample gets larger.
Then again, that might not happen, because that’s the game of baseball. That’s part of the beauty of the game. The process can be 100 percent correct, but you can have bad results, and vice versa.
That’s why it’s so important to emphasize getting the process right. In the long run, a team with good process but bad results in the short-term will do better than the team with bad process and good short-term results.
If you want to take shots at something, at least come armed with the proper tools, and not just attitude and clichés. Hashtagging your tweet #WARWhatIsItGoodFor isn’t going to change the fact that WAR is the best way to fully measure the overall impact that a player has on the field devised to this point.
Saying that sabermetrics are made up is about the dumbest argument that a person can make to try to argue against them. Of course they are. All stats are made up. Wins, losses, ERA, saves batting average, home runs, RBIs, even at bats – all made up. If you reject a statistic because it is made up, perhaps you should go back to the days of Henry Chadwick’s boxscores, which published only runs.
If Puma wants to blatantly and outlandishly brush good research aside so that he can wag his finger and laugh at those dang kids with their spreadsheets, that’s his problem. Bill James is laughing up in Boston as he looks at his three World Series rings. Billy Beane in Oakland has for a third time rebuilt the Athletics into a playoff team on a shoestring budget. Every single team in baseball has an analytics department working with sabermetrics. All 30 of them.
But remember, sabermetrics don’t work, and anyone who thinks they do is dumb, and obviously knows nothing about baseball because they’re too busy making spreadsheets in their mother’s basement and not watching the games. After all, we’ve thought about baseball for one way for over 100 years, and so that must be right.
But ask yourself, is there anything that existed over 100 years ago that hasn’t been either improved upon or made obsolete? Would you get in a 100-year old airplane that hadn’t been brought up to today’s safety standards? It was common advertising less than 100 years ago to say that cigarettes had health benefits, and could help you lose weight.
Things change over time, but for some reason, there are groups that think that baseball – and the way we think about it – haven’t.
In the end, Puma is the one who doesn’t feel the need to enrich his understanding of the game of baseball by opening his mind to new, and yes, better ways of thinking, which is strange, since his job is to cover baseball. One would think that he would want to get his facts right so he can do his job to the best of his abilities.
I enjoy a good discussion about the game of baseball, and if someone wants to argue against aspects of sabermetrics, legitimate points can be made. But an argument from ignorance is something that grinds my gears.
I don’t know all there is to know about the game, nor do I pretend to. I am always trying to learn more, because I want to understand baseball as well as I can, so I can do my job better, and because I’m a junkie. I enjoy learning something new about baseball, even if it means that I have to change a viewpoint that I previously held.
“If someone studies the results of the research, and then provides criticism of the methodology, assumptions, data and underlying basis of the research, then I can have a conversation with that person,” said Tom Tango, co-author of The Book: Playing the Percentages in Baseball in an email interview for the aforementioned article in December. “Providing a summary opinion with no evidence is tantamount to bulls**t. It’s the very definition of bulls**t. And I’m not interested in debating bulls**t.”
Joe Vasile is the voice of Fayetteville SwampDogs baseball.
Mostly, yeah.
But “WAR is the best way to fully measure the overall impact that a player has on the field devised to this point”?
By definition, WAR is made up of a component that nobody can agree upon. If you cannot trust the components of a formula, how can you trust the formula?
And indeed, there is no universally accepted number called WAR. If we cannot accept a single version of it, and the components of it are in question, how can this be the “best single” method of evaluating players?
Of course, just trotting out a trend line though the data doesnt say anything. What is the r^2 or MSWD value on that regression? In fact just plotting the data is not a statistic. So what is interesting is not that there is a correlation in that data, but the question is how can that information be translated to consumable day to day baseball operations and decision making. What is also missing in Alderson’s comment is that both sides of his discussion are variable, a convenient fact he neglects. It is much more likely from the baseball we have seen that the run differential will be more likely fade away into the record. Considering the actual record represents reality and the pyth record is a synthetic simulation, I think its much more likely that the actual record is who we are.
It is also interesting to break that line down into component parts. A large amount of the data plots in positive run differential space, if you just look at those data, the regression will be very sloppy indeed, in fact I suspect the r^2 value will fall below having any statistical meaning.
I also come down to the point of “so what”? Why the actual record is somehow less meaningful that a simulation defies any reason or logic. Alderson’s comments show a detachment from reality, as if his world is a game of Strat-O-Matic or a fantasy league. My eyes watch a team that is hopelessly lost in nearly all aspects of the game, and the record of 11 under .500 makes perfect sense.
Funny thing the quote:
“It’s better to remain silent and be thought a fool, then to speak and remove all doubt.”
equally applies to Alderson and the bilge water he spouts on a regular basis.
The issue with the Mets is fairly basic. They don’t hit in the clutch. The team is bad because of that. The close run differential is a result of good pitching and bad clutch hitting.
Talk about a tempest in a teapot. I have no problem with Puma’s tweet. The Mets are 37-48. That’s they record in the only category that counts in the standings. Wins, loses. When Alderson spins about run differential, and being close, and “kind of liking” the team, he’s simply putting lipstick on a pig. I don’t read anything that’s anti-sabermetrics about his 140 character tweet. The Mets are having another brutally bad year, with Alderson leading the way. Maybe Puma just resents the bullshit? At the beginning of the year, Alderson talked about winning 90 games, not being plus-75 in the run differential column.
[…] Alderson took a stroll down Sabermetric Lane yesterday, much to the delight of the “LOLMets” crowd. His main point was basically to say, “Hey, look. We’re not as bad as we seem. We’ve […]
Excellent article.
Im not sure where the numbers on the graph came from, so I measured off the plot as best as I could for the data between 80-100 wins and RD > 1. The r^2 = ~.4 (r = 0.609), putting about 40% of the variance as shared between RD and Wins. In this part of the data set, that highlights the amount of “outlier” data. How does that translate to the Mets situation, with a circum zero RD? My money says its not easy to directly say that the RD and record are so easily reconciled. There is a trend however, and one that is not surprising…the greater the run differential, the more likely you are to have more wins, but the level of certainty isnt that great, which is maddening in this part of the data where playoff teams are born…or not.
Interestingly the data are much better correlated in negative RD space, for reasons Im not so sure about…
“Interestingly the data are much better correlated in negative RD space, for reasons Im not so sure about…”
I would guess, that there are many ways to win a game but only one way to lose. You’ve let the other guy score more than you did.
To that ends, bad teams will consistently give up more runs than they score and more importantly, will be trounced by good teams. They lack the depth of pitching, particularly in the bullpen, to prevent blowouts. Likewise, they are less likely to put out sub-replacement level talent when the 1st tier guys inevitably get injured because again, they lack the depth.
But teams with marginal to good talent “find a way to win”. They have a 4th talent in the pen as good as the other team’s 1st, or they find a way to replace the failing 620 SS with a 700 SS.
In rereading your article Joe, I was really struck by a quote you mentioned from James: “sabrmetrics don’t have anything to do with statistics”. And while that didn’t latch on to me initially, after the vigorous defense you presented, and my cursory look at your graph, I’m quite concerned this plot actually shows what is called a “forced correlation”. Essentially this means that the two plotted variable are required to have a correlation, mostly because the variables are dependent on each other. Ultimately this means the correlation has little meaning. In data analysis, statistics is part of the game in order to identify objective truths. As James is no dummy, I’m taking the quote to be a bit of an off the cuff flyer.
Let’s look at the plot, and approach the problem differently. Right now there is a positive correlation between variables. For a correlation not to be the effect of forcing, the data need to be free to occupy any part of the plot and be correlated positively, negatively, or not correlated. Let me ask this question: could we have a case where these two variables can be negatively correlated? Could a team have 100 wins with a -200 run differential? It is not mathematically impossible. For example, a team could lose a bunch of 20-1 games and rack up a huge deficit and only win close games. However, that is not how a team wins 100 games and a team that good self limits the number of blowouts that occur. Essentially data cannot plot on this part of the diagram. The reverse is true as well in order to have a negative correlation. Can a team have +150 run diff, but win only with 60 games? Again, the path to that is racking up quite a few little league scores, but losing everything else. Mathematically possible, but not actualistically probable. In the end, the only way these data can exist is with a positive correlation, and that makes this a forced correlation, which essentially removes deep meaning from it.
Chris, you know more about statistics than I do but I don’t see how you can say “the only way” when we do have cases where teams had winning records with a negative run differential. It’s rare over a 162-game season, but it has happened.
what I am saying is every data point needs to be free to be everywhere in that space. clearly, it is not. The variables are partly dependent, and the correlation is virtually required to be positive. Can you show me how any baseball results data (not a single point) that can be put on this plot showing a negative correlation? If that cannot occur, then the correlation is forced.
If you’re asking me to put 20 different times from one season where a team had a negative run differential and an above .500 record (or vice-versa) — I cannot do that.
But I think that only works in support of the argument that if the Mets are essentially equal in their run differential that it doesn’t make sense that they are double-digits below .500
Now, you’ve made the point before that the results could swing either way and that’s certainly true. But by the end of the year it’s extremely unlikely they will have this discrepancy. I will not argue with you if you say that their differential will sink to their record.
What would you call a team with a -20 run differential but 10 games over .500? First place in the NL East. The point is Ws and Ls are what counts in the standings. Not hypothetical analysis.
I was content to leave this alone, until Alderson doubled down yesterday, reiterating the misguided belief that the team is better than it’s record. Given this, I have been beefing up on the Pythagorean record and found a beautiful passage in the Fangraphs library, which is very instructive. Not only did it confirm what can be seen, it shows that Alderson is mistaken in his understanding and flapping his gums errantly and annoyingly. I am convinced the Pythagorean record is a retrospective metric, not a predictive metric. Furthermore there is a scale issue to its application. It is not functional the way Alderson uses it. In any event, I took some quotes from the Fangraphs site.
“Teams whose real winning percentages exceed their expected winning percentages are often referred to as ‘lucky’, and teams who do the opposite are ‘unlucky’. This is a crutch, and it’s far from statistically rigorous.”
“…it’s clear that ‘luck’ strikes far more deeply than in simple runs scored and runs allowed in a season”
“…there’s no reason to assume that we’re getting the whole truth from runs scored and runs allowed alone. The idea of pythagorean ‘luck’ is a quick rule of thumb and nothing more.”
“Another commonly held belief about pythagorean expectation is that its function is to predict wins and losses given the runs scored/runs allowed data. This is not true: it is merely a statement of a relationship, and it’s very important not to forget that.”
Yes. Pythagorean record is retrospective. It was developed after observing the relationship between runs scored, runs allowed, and wins.
That being said, it’s ability to “predict” lies in the concept of regression and law of averages. While it’s possible to flip 5 heads from a fair coin, and then flip 5 more heads after that on the same coin, the law of averages tells us that is unlikely.
This applies to Pythagoren record (and any other relationship metric such as FIP). The Mets so far have played 10 games under .500 while scoring and allowing about the same number of runs, which would expect to yield. .500. While it’s possible to continue doing it for the rest of the year, it’s unlikely.
However, there is a key assumption that Sandy is making and that is that the team will continue to score and allow about the same amount of runs. Unlike flipping a coin which is going to be a fair coin no matter how many times you flip it, the team may not continue scoring and allow runs at the same pace they have before.
Thus, when Sandy says that the team is “better” than their record because of their run differential, I take it that he is implying that IF the mets continue to score and allow runs at the same pace they have, that they’ll play around .500 the rest of the way, and instead of the projected 20-under pace they will finish with 10-under.
Yes, agreed Name.
Ultimately, the Mets will either regress to run differential that matches their output in wins to this point.
Or, they will improve their won loss record.
To a 95% degree of certainty this will be the outcome of the rest of the year.
Side note: In games started by Travis D’Arnaud, the Mets pitching staff continues to get pounded. Since his return from Las Vegas to the tune of 4.5 runs per game.
This might be seen as an anomaly, but staff ERA is running nearly a full run higher with the new prospect behind the dish than other catchers since he’s arrived in Queens.
Sort of hard to win that way.