Monday, February 22, 2010

The J.D. Express

Strange as it may seem, J.D. Drew has been getting some good press recently.  I would like to jump on the bandwagon.

Last fall Red Sox GM Theo Epstein praised Drew as one of the top outfielders in the American League. Then not too long ago Amalie Benjamin of the Boston Globe wrote a very flattering article about Drew. The basic thrust of the piece was: Drew is not the kind of guy who inspires fan clubs with his personality, but, despite some of his "popular" statistical benchmarks (RBI, fr'instance) appearing sub-par, Drew does the things that truly win ball games: get on base, run the bases well, excel on defense.

The Globe article cited Fangraph's WAR statistic in evaluating Drew's contributions to the Red Sox. I actually prefer Sean Smith's similar statistic (also called WAR), because it includes more aspects of the game (like baserunning). To its credit Fangraphs' WAR uses the superior UZR to measure defensive value. Probably the best thing would be to take UZR from Fangraphs and the other stuff from Baseballprojection.com, but I'm too lazy to do the legwork.

Anyway, here's the J.D. Drew page from Baseballprojection.com:



As you can see, Drew has been above average in every performance category except outfield arm.  That includes: batting (Bat Runs), base running (BsR), avoiding the double play (GIDP), reaching base on error (ROE) and defensive range (TZ, Total Zone).  Despite missing a significant number of games over the course of his career, he has been an above average player (more than 2 WAR) every season of his career.   Using the going rate of $4.5 million per free agent WAR, Drew has so fully justified his lofty salary since signing with Boston three years ago.

Note that Drew for his career is averaging around 5.5 WAR per 650 plate appearances.  That's just about the same as Manny Ramirez over the same time frame.  And Vlad Guerrero and Carlos Beltran, too, for that matter.  Of course, Drew has not been in the lineup as much as those guys, he's been more fragile.  But even on a yearly basis, Drew has produced about 4 WAR per season over his career, still a very solid number.  That production is worth about 18 million dollars a year on today's (or maybe yesterday's?) free agent market.

Most Grand Slams Ever

I was flipping through my copy of The Bill James Historical Baseball Abstract recently, and I came across the player comment on Lou Gehrig. I was expecting a long, laudatory comment, but all James simply wrote a few dozen words on Gehrig's home runs with men on base. You probably know that Gehrig is the career leader in grand slams with 23. But apparently, he hit quite a few 3-run homers and more than his share of 2-run blasts, to boot. Add 'em all up and Gehrig averaged "1.77 RBI per home run, the highest [value] among players with 300 or more home runs."

I was sort of hoping for a more "meaty" comment by James, but actually this is a pretty interesting tidbit. I've been recent fooling around with some home run data that is available over at bb-ref.com and I thought I'd poke around this business of home runs with men on base.

First of all, here are the career grand slam leaders:
+-----------------+------------+-----+
| Name            | GrandSlams | HR  |
+-----------------+------------+-----+
| Gehrig, Lou     |         23 | 493 | 
| Ramirez, Manny  |         21 | 546 | 
| Murray, Eddie   |         19 | 504 | 
| Rodriguez, Alex |         18 | 583 | 
| Ventura, Robin  |         18 | 294 | 
| McCovey, Willie |         18 | 521 | 
| Foxx, Jimmie    |         17 | 534 | 
| Williams, Ted   |         17 | 521 | 
| Ruth, Babe      |         16 | 714 | 
| Kingman, Dave   |         16 | 442 | 
| Aaron, Hank     |         16 | 755 | 
| Griffey, Ken    |         15 | 630 | 
| Sexson, Richie  |         15 | 306 | 
| Giambi, Jason   |         14 | 409 | 
| Hodges, Gil     |         14 | 370 | 
| Piazza, Mike    |         14 | 427 | 
| McGwire, Mark   |         14 | 583 | 
| Lee, Carlos     |         14 | 307 | 
| Belle, Albert   |         13 | 381 | 
| Kiner, Ralph    |         13 | 369 | 
+-----------------+------------+-----+
There's Lou in the top spot, with Manny Ramirez poised to take the lead in the next couple of years. Looks like Alex Rodriguez will also reach 23 at some point. Interesting that Robin Ventura managed 18 grand slams while hitting "only" 294 home runs. That's the highest ratio of anybody with at least 200 homers:
+------------------+------------+-----+--------------+
| Name             | GrandSlams | HR  | Slams_per_HR |
+------------------+------------+-----+--------------+
| Ventura, Robin   |         18 | 294 |       0.0612 | 
| White, Devon     |         11 | 208 |       0.0529 | 
| Sexson, Richie   |         15 | 306 |       0.0490 | 
| Gehrig, Lou      |         23 | 493 |       0.0467 | 
| Stairs, Matt     |         12 | 259 |       0.0463 | 
| Lee, Carlos      |         14 | 307 |       0.0456 | 
| York, Rudy       |         12 | 277 |       0.0433 | 
| Petrocelli, Rico |          9 | 210 |       0.0429 | 
| Hunter, Torii    |         10 | 235 |       0.0426 | 
| Tartabull, Danny |         11 | 262 |       0.0420 | 
+------------------+------------+-----+--------------+
Ventura leads the pack by a wide margin, but there are some more interesting names coming out here: Devon White, who was mostly a leadoff hitter (!) and Matt Stairs, the Wonder Hamster. Gehrig, of course, is near the top.

Of course, to hit a grand slam you have to bat with the bases loaded, and not all batters get the same opportunities to do so. We can look at those opportunities going back about 55 years (the so-called retrosheet era) (which means we don't know how often Gehrig and Ruth batted with the bases loaded, unfortunately). We expect, of course, the number of bases-loaded opps to depend heavily on batting order position. I'm guessing the #4 batters will have an advantage over #3 batters and perhaps all other lineup slots. Well, thanks to retrosheet, we can figure that out (data from 1954-2009):
+------------+-------------------+
| lineup_pos | Opps_bases_loaded |
+------------+-------------------+
|          6 |             27991 | 
|          5 |             26127 | 
|          7 |             24616 | 
|          9 |             23978 | 
|          8 |             23119 | 
|          4 |             21881 | 
|          1 |             18504 | 
|          2 |             18423 | 
|          3 |             17565 | 
+------------+-------------------+
Whoa! That's surprising. The #3 slot, where you find many of the best home run hitters, is the absolute worst position for hitting with the bases loaded. Does that surprise anybody else? I mean, leadoff hitters hit more often with the bases loaded than #3 batters. Now, that's partly because leadoff hitters get more plate appearances overall than #3 hitters, but still.

Gehrig batted cleanup for the Murderer's Row Yankees, of course, behind Ruth, who batted third. Most of the guys on the career list above batted 3rd or 4th, simply because that's where most home run hitters bat. A notable exception is Ventura who batted 5th mostly and hence likely saw more bases loaded situations than some of the others. Actually, we can check that, too, with retrosheet. Here are the top 20 Slammers along with the number of PA's they had with the bases full:
+-----------------+-----+-------+------+
| Name            | HR  | Slams | Opps |
+-----------------+-----+-------+------+
| Gehrig, Lou     | 493 |    23 | NULL | 
| Ramirez, Manny  | 546 |    21 |  278 | 
| Murray, Eddie   | 504 |    19 |  302 | 
| McCovey, Willie | 521 |    18 |  196 | 
| Rodriguez, Alex | 583 |    18 |  232 | 
| Ventura, Robin  | 294 |    18 |  238 | 
| Foxx, Jimmie    | 534 |    17 | NULL | 
| Williams, Ted   | 521 |    17 |   66 |* 
| Kingman, Dave   | 442 |    16 |  167 | 
| Aaron, Hank     | 755 |    16 |  256 | 
| Ruth, Babe      | 714 |    16 | NULL | 
| Sexson, Richie  | 306 |    15 |  162 | 
| Griffey, Ken    | 630 |    15 |  199 | 
| Hodges, Gil     | 370 |    14 |  169 |* 
| McGwire, Mark   | 583 |    14 |  164 | 
| Piazza, Mike    | 427 |    14 |  177 | 
| Giambi, Jason   | 409 |    14 |  211 | 
| Lee, Carlos     | 307 |    14 |  169 | 
| Kent, Jeff      | 377 |    13 |  289 | 
| Kiner, Ralph    | 369 |    13 |   33 |* 
+-----------------+-----+-------+------+
The column labeled "Opps" shows the bases-loaded PAs. "NULL" means that player played before the retrosheet era and I've put a "*" next to guys whose careers are only partially covered by the retrosheet data. Here we see that Ventura actually did have more bases-loaded plate appearances than most of the others in the table, although he doesn't come close to Eddie Murray or Jeff Kent. Still, Ventura certainly had a better HR percentage when batting with the bases loaded than he did otherwise.

By the way, the all-time (meaning since 1952) leader in bases-loaded PAs is Brooks Robinson, who had 333 chances to hit a grand slam. He hit five.

Tuesday, February 2, 2010

Where should Robinson Cano bat in the order?

Let me say that Steven Goldman, over at the Pinstriped Bible, is one of my favorite sportswriters. Actually, he was numero uno, until he was dethroned by The Poz, but there is no shame in that, not at all.

What Goldman has in spades is writing chops — the guy knows how to put words together to make you understand the game. That is why I follow him religiously, even though his main subject matter is the hated (by me, of course) Yankees.

Goldman, who also moonlights over at Baseball Prospectus (or maybe that's his day job, I dunno), is pretty sabermetrically savvy, but I'm not totally on board with his recent blog post on Robinson Cano. In that post, Goldman essentially claims that:

  1. Robbie Cano hits much better with the bases empty than he does with men on.
  2. Given (1), batting Cano 2nd in the lineup would be smarter than batting him in his usual position lower in the order, the idea being that in the two-hole, Cano would bat more often with the bases empty.
Let's take these one at a time. When I see anybody claim that any hitter is better in a certain situation, e.g. with runners in scoring position, I immediately get skeptical. My, admittedly nerdy, working assumption is that batters hit the same in all situations. Now, before you get all huffy, let me make a disclaimer:
  1. Obviously, one split that really is real is the platoon split. Nearly all batters hit better against opposite-hand pitching.
  2. I'm not saying that there isn't any player in the world who truly hits better with the bases empty (or in night games, or in Cincinnati, or on Tuesday). I'm just saying that I believe that the vast majority of players do not hit better in these odd situations. Most everybody hits about the same most situations. That's my working assumption. I'll be wrong sometimes, but (I believe) I'll be right most of the time.
So, here are Cano's splits with nobody on and with men on base:
.
               PA   AVG   OBP   SLG
Bases empty  1580  .331  .363  .528
Runners on   1456  .280  .312  .425
Wow, Cano really does hit a lot better with the bases empty. I checked to see if these numbers might just be a statistical aberration and it seems pretty unlikely. For example, the probability that the 51 point difference in OBP is just a statistical fluctuation (i.e. that Cano's true OBP in the two situations are the same) is about 0.5%. Cano's propensity for hitting with the bases empty is especially curious, since on average batters fare better with runners on. (That's because the defense has to worry about the baserunners in addition to the batter.)

Ok, so maybe Robbie Cano really does hit worse with runners on. That's unfortunate for him, but so be it. But will batting Cano 2nd instead of 7th (his usual spot in 2009) make a difference? Goldman's take:
To get the most out of Cano, a manager might keep him out of RBI spots. Now, when you have one of the best offenses in baseball, your whole batting order is an RBI spot. That’s why the second spot in the order is a place he might prosper. Even if the Yankees get another .400 OBP from their leadoff man, Cano would be batting with the bases empty 60 percent of the time, do his best hitting, and be on base for Mark Teixeira, A-Rod, et al. The downside is that you might get a few extra Cano double-play specials when the leadoff man does reach base.
The following table shows how often Yankee batters came to bat with the bases empty for the different lineup positions in 2009:
+------------+-----+--------+------------+
| lineup_pos | PA  | noneOn | noneOnFrac |
+------------+-----+--------+------------+
|          1 | 785 |    496 |     0.6318 | 
|          2 | 772 |    392 |     0.5078 | 
|          3 | 753 |    358 |     0.4754 | 
|          4 | 732 |    341 |     0.4658 | 
|          5 | 715 |    356 |     0.4979 | 
|          6 | 699 |    402 |     0.5751 | 
|          7 | 682 |    371 |     0.5440 | 
|          8 | 668 |    361 |     0.5404 | 
|          9 | 643 |    336 |     0.5226 | 
+------------+-----+--------+------------+

The last column of this table shows the fraction of plate appearances that occur with nobody on base. Now, focus on the #2 and #7 positions: actually, the #2 slot batted with nobody on less often than the #7 position (51% compared to 54%). Putting Cano in the #2 slot would actually worsen his overall production (taking at face value the splits mentioned previously.)

Furthermore there is an additional cost of batting Cano 2nd: it likely moves Nick Johnson, you know, the .400 OBP guy, down in the order. (Goldman talks about Johnson possibly leading off and Cano batting second, but the Yanks have a guy who has always batted #1 or #2, fella name of Jeter.) Obviously, batting higher in the order gets you more PA's. A good rule of thumb is that each spot in the order gets 18 more PA's over the season than the next one. Anyway, what difference might we expect in Yankee offensive production if Cano bats second and Johnson bats seventh and vice versa? Well, it turns out the difference is quite small (I'll spare you the details), batting Cano 2nd and Johnson 7th would cost the Yankees about a run over the course of a season.

So, moving Cano up in the order will not really accomplish anything for the Yankee offense. In the end, careful analysis of batting order scenarios invariably leads to the conclusion we have here — it just doesn't make much difference. Goldman himself said it nicely:

First, many studies suggest that the difference between the optimal batting order and the least-optimal batting order is quite small. That said, there is a difference, and even if it’s as little as one win a season, you never know when you might need that one win.

It's just that the "win" he mentions is more often a small fraction of a win, but it's still true that you never know when you might need that one small fraction of a win.

Monday, February 1, 2010

What'd I do to deserve this?

Everybody googles themselves once in a while, right? I do it, too, but I use Google Alerts to help me out. Basically, Google Alerts sends me an email whenever it sees somebody refer to me out there on the Web. So, the other day, I get something from Google Alerts — Dave Cameron over at FanGraphs started a thread called "The Sabermetric Library" in which he asked readers to list in the comments section their favorite "influential sabermetric articles".

There have been 47 responses so far, one of which mentioned me by name -- hence the Google Alert. The kind soul, who uses the moniker "vivaelpuljols", who referenced my work wrote:

John Walsh’s building blocks of Pitch f/x work:

http://www.hardballtimes.com/main/article/fastball-slider-changeup-curveball-an-analysis/

http://www.hardballtimes.com/main/article/pitch-identification-tutorial/

http://www.hardballtimes.com/main/article/the-eye-of-the-umpire/

http://www.hardballtimes.com/main/article/how-fast-should-a-fastball-be/

http://www.hardballtimes.com/main/article/searching-for-the-games-best-pitch/


Bless your soul, vivaelpujols! I think the above pieces are some of my best, so it's good to see the shout-out. Then I saw something that tempered my satisfaction a bit. At FanGraphs, readers can rate individual comments, giving them either a "thumbs up" or a "thumbs down". And I saw that vivaelpujols excellent suggestion had received 0 upward-pointing thumbs and 3 downward-pointing ones. WTF? What'd I do to deserve that?

I learned later that vivaelpujols is actually a guy named Nick Steiner, who is a regular writer at the Hardball Times. I don't contribute to THT much anymore, just don't have the time, but I do follow the site fairly closely and Steiner is one of the better analysis-oriented writers over there. And he has great taste in others' work.

UPDATE: Somebody gave the first "thumbs up" to the comment in question. Thanks, Dad.