A fascinating new article was published today at FanGraphs. Written by NEIFI Analytics, it talks about the nature of statistics in MLB today. What makes it so interesting is that NEIFI is a group that has worked within the sport, producing some of the proprietary systems not readily available to those of us outside the game. Here is but one snippet:
There’s an important lesson hidden here. As teams and the sabermetric public are on the hunt for new insights, there’s a natural assumption to make: that the next answers lie within the information we can’t yet see. If only we knew the spin on the slider, we might understand the strikeouts. There are two issues with this approach. On one hand, the presumption that the new level of detail will contain those exact details which reveal further truth, for example, that significant elements which determine strikeouts are contained within the particular information Trackman is providing, and not within other areas not captured. On the other side of that coin is the simple and fundamental truth that the most valuable insights in sabermetrics have come not from new data sources, but by re-imagining elements of the performance record which already existed in sufficient detail.
Source: FanGraphs
I heartily recommend reading this piece in its entirety.
*****
If you prefer longer articles, that’s our specialty here at Mets360. Just click on “Perspectives” or “Minor Leagues” or “History” on the grey menu bar above this article’s headline and you will be taken to a list of over 2,500 articles written since the beginning of 2010.
If you enjoy the quick hitters, click on “Quotes” in the same menu bar to see our archive.
*****
Interesting. It seems to show how much more there is to know.
“Which is all to say that by making information more granular, it’s entirely possible to gain 10,000% more data and only 1% more evaluative power.”
Considering the man power needed to grind through 10 000 % data, that seems like a daunting task.
Baseball is both tremendously Granular and Sequential in the on filed events that drive the outcomes. The Metrics we’ve developed have always been used to gain perspective on what happened and who made it happen…and to memorialize the specific contributions of individual players.
The statistical reverse engineering that has employed stat lines as a method of team planning and player evaluation is old. The “new stats” added instant value to this effort.
The game is, however, greatly impacted by Non-Stats…. poor plays on which no error was called….Poor Baserunning decisions without specific tracking— “The Danny”, my own concept of a “Dumb Play (no recordable “error”)” stat , has not yet taken hold. There are hundreds of executions…tactical decisions… maybe they’re unrecordable—or maybe, we haven’t yet started to record them.
The game is Magic because the game is so much about Sequence….more so than any other game. we have 8 hit 9 run games—and 11 hit 1 run games.
I love the Eye Test…it captures all….. and I love the stats…after all, I am a Baseball Fan!!!
I cannot wait for Spring!
This is summed up in Nate Silver’s phrase about “the signal and the noise.”
More date, more noise, than ever before. So many times I read articles that mistake data for the signal.
Just a moment ago, I read an absurd, lengthy piece by an MLB writer on Lagares and spray charts and his exit speed velocity after August 1 — it went on and on and on — and yet never mentioned that the Mets sat him against RHP once Cespedes arrived. In fact, the writer gave no evidence that he was aware of that central fact. Must have been too busy analyzing exit speed data.
The reality is that it takes skill to decipher stats, to find the signal. Yet at the same time, anybody can read a stat and build a narrative. Another guy I read this morning, a local sport doofus, argued that Manning will beat Brady this morning because history is on his side: The home team had won every game since 2007. Of course, he could have latched onto a different stat and made the complete opposite statement.
My strong suspicion is that more data equal more noise, which will result in the blurring of the signal in many, many minds.
Funny you mention that James. The writer of that article is a full on Mets beat writer for MLB, Anthony diComo. The piece on Lagares was crazy, who got pure cherry picked numbers post Cespedes. As you said, that never was mentioned, despite a graph he plotted showing how dramatic the shift was once he started getting much fewer and favorable ABs.
For sure the data overload will result in strings of spurious concoctions of cause and effect, and folks will swarm to the next new thing like its the second coming, and it will surely be fraught with the biggest issue people seem to forget: baseball is a team sport where individual accomplishment is strongly linked to strings of dependent variables as well as personal achievement.
correction on that. I found the article through diComo’s twitter. I was authored by someone else.
For many “Old Timers”, who keep a Scorebook, they add some elaborate notations to provide specific indications of Non Stat Events and other “plays” within an At Bat, “scoreable play”. This is the original format that captures Stats…..and sophisticated scorers have always developed a shorthand to memorialize the “I Saw It” stuff….. it adds texture to the raw abjective recording.
I believe I’d be echoing Jim’s Thoughts in saying, You Need To Watch The Game.
Wonderful stuff, gentleman. Stats are cool. The game is great.