Was Utley Really Better Than Rollins and Howard? Debating Advanced Metrics


I want to start out by stating something that should be obvious- I look at a baseball player’s WAR just like lots of other people. It’s useful in comparing players, even if not entirely a perfect way to do so. I also like a good argument as much as the next guy. So when this morning’s debate about who was the better between Chase Utley, Jimmy Rollins, and Ryan Howard, I couldn’t resist.

My immediate, gut reaction is that Jimmy was a better player than Chase, overall, and that both were a little bit better than Howard, overall. That is where I started this debate from, at least. Apparently though, advanced metrics do not agree with me. Going based on WAR, Chase Utley wins this discussion hands down. Before we go a step further, let’s just discuss WAR for a second. What is it? From Baseball-Reference:


Since we added Sean Smith’s (“rallymonkey” to some) Wins Above Replacement measurement in 2010, we’ve seen its use expand in to many new areas and its popularity catch on in the media and the general population, but there have also been a lot of questions about how it’s calculated and whether it has validity. In this tutorial, I’m going to run through the calculations in graphic detail and point out areas where our approach differs from some of the other popular WAR or WAR-like approaches.

How to Use WAR

The idea behind the WAR framework is that we want to know how much better a player is than what a team would typically have to replace that player. We start by comparing the player to average in a variety of venues and then compare our theoretical replacement player to the average player and add the two results together.

There is no one way to determine WAR. There are hundreds of steps to make this calculation, and dozens of places where reasonable people can disagree on the best way to implement a particular part of the framework. We have taken the utmost care and study at each step in the process, and believe all of our choices are well reasoned and defensible. But WAR is necessarily an approximation and will never be as precise or accurate as one would like.

We present the WAR values with decimal places because this relates the WAR value back to the runs contributed (as one win is about ten runs), but you should not take any full season difference between two players of less than one to two wins to be definitive (especially when the defensive metrics are included).

WAR Editions/Changes

This page and related pages give the gory details on how we calculate WAR.

Version 2.2, March 2013

Based on discussions with FanGraphs and others, we decided to drop the replacement level to .294 from .320. This means that 2013 MLB has 1000 WAR in the entire major leagues. This was applied retroactively, so that it was applied to all previous AL, NL, and NA seasons. Other Major Leagues (AA, UA, PL, FL) were maintained at the same level relative to the NL.

After the positional adjustment was applied we forced the major league average to be zero across the league.

A small amount of smoothing was done to transition between decade-long league-vs-league replacement levels.

Version 2.1, May 2012

After launching version 2.0 on May 4th, we immediately became aware of an issue that had concerned us, but we thought we still had right. Pitchers were being overvalued due to a runs to win estimate that broke down for extreme performances. A stingy pitcher drives down the runs per win, but not to the degree we were showing.

  • A major change to runs to win calculation. See our Runs to Wins Page for a full explanation. We now handle runs to wins calculations in an exact rather than an estimated way.
  • With the change in runs to wins calculation we can now display Wins Above Average, a related win-loss % and a related win-loss% for 162 games played.
  • Leverage Index adjustment is now only applied to relief pitchers.
  • Leverage Index used is now the LI at the time the reliever entered the game rather than the average LI for all of their plate appearances. This is weighted by number of batters faced.
  • Converted Offensive WAR from afWAR back to oWAR. Note that oWAR + dWAR now double counts position, so adding them will not give WAR.
  • The adjustment for the difference in league starter and reliever ERA has been moved to the calculation of league average rather than the league replacement level.
  • BUG: Fixed a park factor bug. Pitcher opponent strength was not neutralized, so a park factor was being applied to a non-park adjusted number. Now the pitcher opponent strength is converted to a neutral environment, averaged for all opponents and then park adjusted based on a custom park factor for each pitcher based on the exact parks they pitched in.
  • Some feedback of version 2.0 we incorporated: Inside the Book and its readers helped immensely with the runs to wins issue.

Version 2.01, May 2012

We made a minor change converting oWAR to afWAR (or average fielding WAR). This has been rolled back and is not in place now.

Version 2.0, May 2012

Previous to launch of 2012 WAR numbers we undertook a top to bottom evaluation of our WAR numbers and added a number of improvements.

  • Switch from BaseRuns for batting to an advanced wRAA metric.
  • Folding ROE, infield singles, SO vs. Non-SO into wRAA.
  • Excluding pitchers’ hitting and averaging by league rather than year from the league averages for wOBA and wRAA.
  • Estimation of CS numbers for leagues they are missing.
  • Use of Baseball Info Solutions Defensive Runs Saved from 2003-present (in our view the most advanced defensive metric).
  • Use of a player-influenced runs to win conversion for both batters and pitchers based on PythagenPat.
  • Use of a player-specific park factor for pitchers weighted by actual appearances in each park.
  • After a preliminary WAR calculation, we fine-tune the replacement level on a playing time basis, so the total WAR in each league is very consistent year-to-year.
  • dWAR now contains the position component as we feel this better captures player defensive value. In our view, even a poor defensive catcher is likely equally valuable to a good defensive first baseman in terms of team defense.

Version 1.0, pre-May 2012

Sean Smith produced the original framework for the site and until May 2012 we used his numbers and methodology in all locations. We still use his replacement level and position levels, but we have changed much of the remainder of the system. Th

The Concept of Replacement Players

Average is a well-defined concept. You sum up all of the observations and then divide by the number of observations. We compute averages every day.

So why don’t we compute Wins Above Average rather than Wins above Replacement? When computing the value of a major league player, average is a poor baseline for comparison. Average players are relatively rare and can be expensive to acquire. Average players don’t make the league minimum. Plus, not all average performances are equal. A team would pay much more for 200 league average innings than for 50. When a star player is injured, they are rarely replaced by an average player–usually their replacement is much worse.

That last point is our premise here. Average players are relatively rare and difficult to obtain. Replacement level players, by their very definition, are players easy to obtain when a starter goes down. These are the players who receive non-roster invites at the start of the year or the players who are 6-year minor league free agents. Baseball talent among the population is generally distributed normally, but only the very right-end of that curve plays professional baseball.

There is some dispute over where to place the replacement level, but most sabermetricians agree that comparing players to a general replacement level is the best approach to valuing players. We’ll talk more about this later.

Sports Reference sets replacement level at a .320 winning percentage for recent seasons. This means that we expect a team of replacement players to have a .320 win-loss percentage or a 52-110 record. We also set the value differently between the two leagues, since the AL has been shown to be the stronger league by inter-league play. This means that in the AL our replacement team might win 48 games while in the NL, 56 games.

Sports Reference sets replacement level at .294 or (48-114). This change was made in March of 2013 after deciding with FanGraphs.com to set a single replacement level between our sites. We also smoothed out the changes in replacement level between the two leagues where before the change from one decade to the next had been stepwise.

WAR: The General Idea

The basic currency of WAR is runs. We start with runs added or lost versus an average player and then compare the average player to a replacement player. I just got done saying we don’t want to use averages, but an equation should explain what we are doing here.

Players Runs over Replacement = Player_runs - ReplPlayer_runs = (Player_runs - AvgPlayer_runs) + (AvgPlayer_runs - ReplPlayer_runs)

This gives us two components, player runs above average (RAA) and then the average player’s runs above replacement.

Ultimately, baseball teams are interested in wins and losses, and so is WAR. RAA is converted to wins above average by running the results through a PythagenPat win-loss estimator (a rundown of PythagenPat. This allows us to more accurately model the interaction between the player and league and the effect on wins. Generally, ten runs will give you one win, but that does not always hold.

Adding up all of the WAR on a team (adding in replacement level (48 wins for a full season)), should get you very, very close to the team’s actual wins and losses, and should match up even more closely with their Pythagorean win-loss records.

Unfortunately, the statistics at our disposal to compare Tris Speaker and Ken Griffey Jr. have changed over time. We now have exact data regarding types and location of batted balls, and this has led to improvements in various measurements (defensive measurements most notably). When we compute our metrics for the various components of WAR, we always use as much data as possible. For example, with baserunning this means that we’ll use stolen bases alone when that is all we have, stolen bases and caught stealings when that is all we have, and full play-by-play accounts of steals by base, pickoffs, and advancements on passed balls, wild pitches, sac flies, doubles, singles, etc when we have that. Here is an up-to-date listing of our Data Coverage.

WAR is calculated separately for pitchers and for position players, so we’ll deal with each of them separately.

Ok, that’s a lot, I know, and some of it might not make sense to you. Here’s the cliff-notes version. If Mike Trout gets hurt, he is replaced by another player. That player isn’t an average outfielder, it’s a replacement outfielder. It’s probably a minor leaguer, or some guy on the waiver wire, someone not capable of doing most of what Mike Trout does. WAR tries to calculate how many wins better Trout is than that guy as a player. They do so by calculating how many runs better Trout is, offensively and defensively, than this replacement player, and divide it by ten. Don’t think of this replacement player as Ben Revere though, think of them as Jordan Danks. Also, think of each player against their position too, so an offensive minded first baseman is easier to replace here.

So, back to the debate here. In 13 seasons, Chase Utley has produced 60.9 WAR. In 16 seasons, Jimmy Rollins has amassed only 45.6 WAR. Ryan Howard produced only 17.2 WAR over 12 seasons. For season averages, this produces out to about 4.68 WAR per season for Utley, 2.85 for Rollins, and 1.43 for Howard. For some perspective, in 18 seasons, Bobby Abreu posted 59.9 WAR, an average of 3.33 a season. Bobby’s career OPS was .870, while Chase is at .846, Jimmy is at .746, and Howard is at .871.

Wait, what? Howard has a higher career OPS than any of them, but has the lowest WAR? Certainly defense is a big part of this, and Ryan’s -1.28 season average dWAR is a big piece of the difference. Of course, dWAR, like most metrics defensively has it’s odd quirks. For instance, 2013 Jimmy Rollins got to more balls than 2014 Jimmy Rollins did, but had four more errors that season. His dWAR was -1.0 in 2013, and 1.0 in 2014, for that minor difference. Yes, dWAR does favor Chase, for the record, but I find defensive metrics in general to be unreliable. After looking at their defensive statistics more, I was probably unfair to Chase thinking he was much worse than Jimmy defensively, but I still tend to slightly favor Jimmy.

So the advanced metrics favor Chase Utley. Based on WAR, he’s over three times as valuable, per season, than Howard. Let’s also not forget though- the replacement second baseman is a much worse offensive player than the replacement first baseman, or even shortstop anymore. Part of Chase’s enhanced value is that he plays a position where offense is extremely premium. So while it’s useful to use WAR to calculate player value, I don’t recommend it too much in this case, where the players played separate positions, and the differential per season was often times a game or two. To keep this in perspective, Chase had a higher WAR than Jimmy in 2007, and Howard in 2006, their MVP seasons, when even the most pro-Chase fan wouldn’t argue he was better. It’s probably more fair to say that Chase had a higher value, per his position, than Rollins or Howard did in their primes. That’s a commentary on both the players and the positions they played. This also seems to bare out when looking at All-Star Game selections- Chase has been in six Mid-Summer Classics, while Jimmy has been in three, and Howard has also been in three.

We could look at this from some other perspectives too, particularly the “traditional” stats. Chase has a career .282/.366/.480/.846 slash line, with 1,615 hits, 232 homers, 914 RBIs, 625 extra-base hits, 624 walks, and 142 stolen bases, over 13 seasons (169 hits, 24 homers, 96 RBIs, 65 XBHs, 65 BBs, 15 SBs per 162 games), while playing in 1,548 games (119 games per season). Jimmy Rollins has a career .265/.325/.421/.746 slash line, with 2,395 hits, 227 homers, 922 RBIs, 838 XBH’s, 784 BBs, and 462 SB’s (176 hits, 17 homers, 68 RBIs, 62 XBHs, 58 BBs, 34 SBs per 162 games), while playing in 2,199 games over 16 seasons (137 games per season). Ryan Howard has a .263/.350/.521/.871 slash line, with 1,393 hits, 353 homers, 1,122 RBIs, 636 XBH’s, 677 BBs, 12 SB’s, over 12 seasons (157 Hits, 40 homers, 127 RBIs, 72 XBH’s, 76 BBs, 1 SB per 162 games), while playing in 1,435 games (120 games per season). Honestly, I can’t separate them all that much from that, other than to say that Jimmy’s clearly the most durable, and has a lot of steals and hits, Chase has a very nice, complete stat line across, and wow to Howard’s power numbers.

Jimmy Rollins is going to end up with the most “milestone” stats, Chase the best advanced metric numbers, and Howard the eye popping seasons and power numbers. Was one really better or worse as a ballplayer? I guess that depends on how you want to look at it. All three will probably end up being the franchise’s best at their position, when they’re done. Arguing that one of them was that much better than the others though, as some would, appears to be wrong to me, after some careful consideration.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: