That Red OneRecently a discussion was held in the comments section of one of our articles on pitcher WAR. As I’m sure you know, there are two main versions of WAR floating out there in cyberspace. There’s one at Baseball-Reference (bWAR or rWAR) and one at FanGraphs (fWAR). It can be confusing to some when two reputable sites champion formulas named the same thing that can produce such wildly different results at times.

However, you should view this as a feature and not a bug. The key is to know why they differ and use that knowledge to interpret the results. When it comes to pitching, the key difference between the two sites is what they use as their main input. B-R uses Runs Allowed while FG uses FIP. There are reasons to prefer either method.

At first glance, the B-R method seems superior. In most newer metrics, we are describing things in terms of runs. What better way to describe pitching than using runs, actual runs, that scored in the actual games?

However, not all runs are created equal. Some runs are completely the fault of the pitcher while some are completely the fault of the defense. Most fall somewhere in between those two examples. They fall in between for various reasons, some of which for shorthand we refer to as “luck.” Some people hate that word and that’s okay – it’s overused and certainly there are times when players make their own luck. So, when you see that word, try not to let it get to you too much.

Regardless, when we use FIP, we are using inputs – walks, strikeouts and home runs – that are solely dependent on the pitcher, save the stray inside-the-park HR, which occurs so infrequently as to not have any practical influence over our numbers. We use FIP to strip out the “luck” from a pitcher’s line.

At this point, you may be following but not convinced. How can we use just three inputs to accurately assess a pitcher’s quality, especially by completely ignoring hits? Back in late 2009, Tom Tango used the following illustration:

“The interesting thing with FIP is that the players wtih the best FIP are also the ERA leaders. For example, here’s the top 10 FIP of pitchers born since 1962, minimum 10,000 batters faced:

FIP
2.99 Martinez Pedro
3.28 Clemens Roger
3.33 Johnson Randy
3.33 Schilling Curt
3.34 Maddux Greg
3.35 Smoltz John
3.48 Brown Kevin
3.55 Saberhagen Bret
3.60 Gooden Dwight
3.68 Mussina Mike

And here are the top 10 in ERA, under the same criteria:

ERA
2.93 Martinez Pedro
3.12 Clemens Roger
3.16 Maddux Greg
3.28 Brown Kevin
3.29 Johnson Randy
3.33 Smoltz John
3.34 Saberhagen Bret
3.46 Schilling Curt
3.46 Cone David
3.51 Gooden Dwight

Look at the names. Pedro is #1 in either case, as is Clemens. Even Maddux doesn’t move much. Schilling and RJ are on both lists, as is Smoltz and Kevin Brown and Bret Saberhagen and Dwight Gooden. In fact, there is only one player different in the top 10: Mussina is #10 in FIP (he was 12th in ERA), and David Cone is #9 in ERA (he was 11th in FIP).”

Tango later added: “Overall, two-thirds of pitchers will have their FIP and ERA be within 0.20 runs of each other, and almost all will be within 0.40 runs of each other.”

So, FIP strips out the luck immediately and in the long haul FIP and ERA match up very well. The old baseball wisdom is that in the long run, your luck evens out. The broken bat base hits and the line drives hit right at a fielder find their equilibrium. So, if given a choice using a number that was dependent on outside factors or one that only used events under a pitcher’s control, wouldn’t you prefer to eliminate the outside influences as much as possible when looking to judge only the pitcher?

Last week I pointed out the pitching lines of Shaun Marcum and Zack Wheeler, who had very similar IP totals at that point. Marcum’s ERA was nearly two runs higher than Wheeler’s. But if we looked at their FIP, the difference was much closer, with Wheeler holding just a 0.46 edge. Marcum was hurt tremendously this year by sequencing, or giving up hits at the wrong time. While Marcum was dreadful in this regard, Wheeler was better than average.

If we look at the two main WAR metrics, we see bWAR has Wheeler at a 1.1 fWAR had Wheeler at 0.6 for the season. Prorated over a full season, the two numbers would have a difference of more than a full unit of WAR, a significant difference. Was/is Wheeler as successful as bWAR would leave you to believe? Or are outside factors like BABIP and LOB% making Wheeler’s numbers look better than he’s actually pitched?

Of course, 80-something IP is not our ideal sample size. But the smaller the sample size, the more luck plays into things. We know Juan Lagares had a .438 BABIP over a 105-PA stretch earlier this season. We don’t want to judge Lagares based on a good luck-fueled sample. And we shouldn’t want to judge pitchers based on luck – good or bad – either.

2 comments on “A quick look at two types of pitching WAR

  • Name

    So after doing a little more research, FIP seems like it is nothing more than just a regression analysis using HR,BB,K’s, and innings as the variables.
    Theoretically, it’s still hard to believe that the total number of hits is irrelevant, but I wonder if anyone has tried to do regression using more variables: HR’s, 1b,2b,3b, bb’s, and k’s, and how that best-fit-line would stack up to the one that is currently being used in FIP.

    • Brian Joura

      I would be floored if someone hadn’t tried to incorporate more variables.

      Perhaps it’s the fact that the pitchers who are able to reach and stay in the majors need to have a certain hit prevention floor. So, it’s not that the number of hits is irrelevant but rather that it’s the single-most important thing. So there’s an inherent selection bias when your group is MLB-caliber pitchers.

Leave a Reply

Your email address will not be published. Required fields are marked *

The maximum upload file size: 100 MB. You can upload: image, audio, video, document, spreadsheet, interactive, text, archive, code, other. Links to YouTube, Facebook, Twitter and other services inserted in the comment text will be automatically embedded. Drop file here