Thursday, March 4, 2010

Luke Appling, leadoff hitter (NOT)

A new article of mine appeared recently at the Hardball Times:

The Philosophy of Batting Leadoff

I got the idea for the piece, like so many in the past, from Bill James, this time from his Historical Baseball Abstract, the old one that came out in the mid-80s.  That book contains an essay on two types of leadoff hitters, typified by Luke Appling and Luis Aparacio.    James wonders if a new type of leadoff hitter might emerge going forward, one that is a sort of hybrid of the fast, no-walk Aparicio type and the patient, good on-base guy like Appling.  James wrote that 25 years ago and I decided to check, using Retrosheet data, if this new hybrid leadoff man was taking hold in baseball.

So, after making the lists of leadoff batters with their stats and a nice (if I may be immodest) graphic showing the historical trends of leadoff OBP, I was pretty happy with the finished piece.   It was also gratifying that the article generated quite a few comments (around 15, which is a lot for a THT article).  Until a reader named "stevebogus" pointed out in the comments that Luke Appling was not actually a leadoff hitter.  WTF?  Bill James wrote a 1000-word essay about Luke Appling the leadoff hitter, of course he batted leadoff.  But, no, I checked, too, at Retrosheet, and Luke Appling really only batted leadoff a handful of games.  He actually batted up and down the order, but most often in the 5th spot.

Let me say this again: WTF?  How could James make that mistake?  Of course, Retrosheet did not exist in 1985, so it wasn't so easy to get the info, but presumably he had some source that told him that Appling batted leadoff.  Otherwise, he wouldn't have written a whole freakin' essay about Appling leading off.  I'm guessing he confused Appling with somebody else.  Maybe Aparicio.

Sheesh.*

*I'm just kidding — Bill James, for me, is the greatest thing since sliced bread. 

Monday, February 22, 2010

The J.D. Express

Strange as it may seem, J.D. Drew has been getting some good press recently.  I would like to jump on the bandwagon.

Last fall Red Sox GM Theo Epstein praised Drew as one of the top outfielders in the American League. Then not too long ago Amalie Benjamin of the Boston Globe wrote a very flattering article about Drew. The basic thrust of the piece was: Drew is not the kind of guy who inspires fan clubs with his personality, but, despite some of his "popular" statistical benchmarks (RBI, fr'instance) appearing sub-par, Drew does the things that truly win ball games: get on base, run the bases well, excel on defense.

The Globe article cited Fangraph's WAR statistic in evaluating Drew's contributions to the Red Sox. I actually prefer Sean Smith's similar statistic (also called WAR), because it includes more aspects of the game (like baserunning). To its credit Fangraphs' WAR uses the superior UZR to measure defensive value. Probably the best thing would be to take UZR from Fangraphs and the other stuff from Baseballprojection.com, but I'm too lazy to do the legwork.

Anyway, here's the J.D. Drew page from Baseballprojection.com:



As you can see, Drew has been above average in every performance category except outfield arm.  That includes: batting (Bat Runs), base running (BsR), avoiding the double play (GIDP), reaching base on error (ROE) and defensive range (TZ, Total Zone).  Despite missing a significant number of games over the course of his career, he has been an above average player (more than 2 WAR) every season of his career.   Using the going rate of $4.5 million per free agent WAR, Drew has so fully justified his lofty salary since signing with Boston three years ago.

Note that Drew for his career is averaging around 5.5 WAR per 650 plate appearances.  That's just about the same as Manny Ramirez over the same time frame.  And Vlad Guerrero and Carlos Beltran, too, for that matter.  Of course, Drew has not been in the lineup as much as those guys, he's been more fragile.  But even on a yearly basis, Drew has produced about 4 WAR per season over his career, still a very solid number.  That production is worth about 18 million dollars a year on today's (or maybe yesterday's?) free agent market.

Most Grand Slams Ever

I was flipping through my copy of The Bill James Historical Baseball Abstract recently, and I came across the player comment on Lou Gehrig. I was expecting a long, laudatory comment, but all James simply wrote a few dozen words on Gehrig's home runs with men on base. You probably know that Gehrig is the career leader in grand slams with 23. But apparently, he hit quite a few 3-run homers and more than his share of 2-run blasts, to boot. Add 'em all up and Gehrig averaged "1.77 RBI per home run, the highest [value] among players with 300 or more home runs."

I was sort of hoping for a more "meaty" comment by James, but actually this is a pretty interesting tidbit. I've been recent fooling around with some home run data that is available over at bb-ref.com and I thought I'd poke around this business of home runs with men on base.

First of all, here are the career grand slam leaders:
+-----------------+------------+-----+
| Name            | GrandSlams | HR  |
+-----------------+------------+-----+
| Gehrig, Lou     |         23 | 493 | 
| Ramirez, Manny  |         21 | 546 | 
| Murray, Eddie   |         19 | 504 | 
| Rodriguez, Alex |         18 | 583 | 
| Ventura, Robin  |         18 | 294 | 
| McCovey, Willie |         18 | 521 | 
| Foxx, Jimmie    |         17 | 534 | 
| Williams, Ted   |         17 | 521 | 
| Ruth, Babe      |         16 | 714 | 
| Kingman, Dave   |         16 | 442 | 
| Aaron, Hank     |         16 | 755 | 
| Griffey, Ken    |         15 | 630 | 
| Sexson, Richie  |         15 | 306 | 
| Giambi, Jason   |         14 | 409 | 
| Hodges, Gil     |         14 | 370 | 
| Piazza, Mike    |         14 | 427 | 
| McGwire, Mark   |         14 | 583 | 
| Lee, Carlos     |         14 | 307 | 
| Belle, Albert   |         13 | 381 | 
| Kiner, Ralph    |         13 | 369 | 
+-----------------+------------+-----+
There's Lou in the top spot, with Manny Ramirez poised to take the lead in the next couple of years. Looks like Alex Rodriguez will also reach 23 at some point. Interesting that Robin Ventura managed 18 grand slams while hitting "only" 294 home runs. That's the highest ratio of anybody with at least 200 homers:
+------------------+------------+-----+--------------+
| Name             | GrandSlams | HR  | Slams_per_HR |
+------------------+------------+-----+--------------+
| Ventura, Robin   |         18 | 294 |       0.0612 | 
| White, Devon     |         11 | 208 |       0.0529 | 
| Sexson, Richie   |         15 | 306 |       0.0490 | 
| Gehrig, Lou      |         23 | 493 |       0.0467 | 
| Stairs, Matt     |         12 | 259 |       0.0463 | 
| Lee, Carlos      |         14 | 307 |       0.0456 | 
| York, Rudy       |         12 | 277 |       0.0433 | 
| Petrocelli, Rico |          9 | 210 |       0.0429 | 
| Hunter, Torii    |         10 | 235 |       0.0426 | 
| Tartabull, Danny |         11 | 262 |       0.0420 | 
+------------------+------------+-----+--------------+
Ventura leads the pack by a wide margin, but there are some more interesting names coming out here: Devon White, who was mostly a leadoff hitter (!) and Matt Stairs, the Wonder Hamster. Gehrig, of course, is near the top.

Of course, to hit a grand slam you have to bat with the bases loaded, and not all batters get the same opportunities to do so. We can look at those opportunities going back about 55 years (the so-called retrosheet era) (which means we don't know how often Gehrig and Ruth batted with the bases loaded, unfortunately). We expect, of course, the number of bases-loaded opps to depend heavily on batting order position. I'm guessing the #4 batters will have an advantage over #3 batters and perhaps all other lineup slots. Well, thanks to retrosheet, we can figure that out (data from 1954-2009):
+------------+-------------------+
| lineup_pos | Opps_bases_loaded |
+------------+-------------------+
|          6 |             27991 | 
|          5 |             26127 | 
|          7 |             24616 | 
|          9 |             23978 | 
|          8 |             23119 | 
|          4 |             21881 | 
|          1 |             18504 | 
|          2 |             18423 | 
|          3 |             17565 | 
+------------+-------------------+
Whoa! That's surprising. The #3 slot, where you find many of the best home run hitters, is the absolute worst position for hitting with the bases loaded. Does that surprise anybody else? I mean, leadoff hitters hit more often with the bases loaded than #3 batters. Now, that's partly because leadoff hitters get more plate appearances overall than #3 hitters, but still.

Gehrig batted cleanup for the Murderer's Row Yankees, of course, behind Ruth, who batted third. Most of the guys on the career list above batted 3rd or 4th, simply because that's where most home run hitters bat. A notable exception is Ventura who batted 5th mostly and hence likely saw more bases loaded situations than some of the others. Actually, we can check that, too, with retrosheet. Here are the top 20 Slammers along with the number of PA's they had with the bases full:
+-----------------+-----+-------+------+
| Name            | HR  | Slams | Opps |
+-----------------+-----+-------+------+
| Gehrig, Lou     | 493 |    23 | NULL | 
| Ramirez, Manny  | 546 |    21 |  278 | 
| Murray, Eddie   | 504 |    19 |  302 | 
| McCovey, Willie | 521 |    18 |  196 | 
| Rodriguez, Alex | 583 |    18 |  232 | 
| Ventura, Robin  | 294 |    18 |  238 | 
| Foxx, Jimmie    | 534 |    17 | NULL | 
| Williams, Ted   | 521 |    17 |   66 |* 
| Kingman, Dave   | 442 |    16 |  167 | 
| Aaron, Hank     | 755 |    16 |  256 | 
| Ruth, Babe      | 714 |    16 | NULL | 
| Sexson, Richie  | 306 |    15 |  162 | 
| Griffey, Ken    | 630 |    15 |  199 | 
| Hodges, Gil     | 370 |    14 |  169 |* 
| McGwire, Mark   | 583 |    14 |  164 | 
| Piazza, Mike    | 427 |    14 |  177 | 
| Giambi, Jason   | 409 |    14 |  211 | 
| Lee, Carlos     | 307 |    14 |  169 | 
| Kent, Jeff      | 377 |    13 |  289 | 
| Kiner, Ralph    | 369 |    13 |   33 |* 
+-----------------+-----+-------+------+
The column labeled "Opps" shows the bases-loaded PAs. "NULL" means that player played before the retrosheet era and I've put a "*" next to guys whose careers are only partially covered by the retrosheet data. Here we see that Ventura actually did have more bases-loaded plate appearances than most of the others in the table, although he doesn't come close to Eddie Murray or Jeff Kent. Still, Ventura certainly had a better HR percentage when batting with the bases loaded than he did otherwise.

By the way, the all-time (meaning since 1952) leader in bases-loaded PAs is Brooks Robinson, who had 333 chances to hit a grand slam. He hit five.

Tuesday, February 2, 2010

Where should Robinson Cano bat in the order?

Let me say that Steven Goldman, over at the Pinstriped Bible, is one of my favorite sportswriters. Actually, he was numero uno, until he was dethroned by The Poz, but there is no shame in that, not at all.

What Goldman has in spades is writing chops — the guy knows how to put words together to make you understand the game. That is why I follow him religiously, even though his main subject matter is the hated (by me, of course) Yankees.

Goldman, who also moonlights over at Baseball Prospectus (or maybe that's his day job, I dunno), is pretty sabermetrically savvy, but I'm not totally on board with his recent blog post on Robinson Cano. In that post, Goldman essentially claims that:

  1. Robbie Cano hits much better with the bases empty than he does with men on.
  2. Given (1), batting Cano 2nd in the lineup would be smarter than batting him in his usual position lower in the order, the idea being that in the two-hole, Cano would bat more often with the bases empty.
Let's take these one at a time. When I see anybody claim that any hitter is better in a certain situation, e.g. with runners in scoring position, I immediately get skeptical. My, admittedly nerdy, working assumption is that batters hit the same in all situations. Now, before you get all huffy, let me make a disclaimer:
  1. Obviously, one split that really is real is the platoon split. Nearly all batters hit better against opposite-hand pitching.
  2. I'm not saying that there isn't any player in the world who truly hits better with the bases empty (or in night games, or in Cincinnati, or on Tuesday). I'm just saying that I believe that the vast majority of players do not hit better in these odd situations. Most everybody hits about the same most situations. That's my working assumption. I'll be wrong sometimes, but (I believe) I'll be right most of the time.
So, here are Cano's splits with nobody on and with men on base:
.
               PA   AVG   OBP   SLG
Bases empty  1580  .331  .363  .528
Runners on   1456  .280  .312  .425
Wow, Cano really does hit a lot better with the bases empty. I checked to see if these numbers might just be a statistical aberration and it seems pretty unlikely. For example, the probability that the 51 point difference in OBP is just a statistical fluctuation (i.e. that Cano's true OBP in the two situations are the same) is about 0.5%. Cano's propensity for hitting with the bases empty is especially curious, since on average batters fare better with runners on. (That's because the defense has to worry about the baserunners in addition to the batter.)

Ok, so maybe Robbie Cano really does hit worse with runners on. That's unfortunate for him, but so be it. But will batting Cano 2nd instead of 7th (his usual spot in 2009) make a difference? Goldman's take:
To get the most out of Cano, a manager might keep him out of RBI spots. Now, when you have one of the best offenses in baseball, your whole batting order is an RBI spot. That’s why the second spot in the order is a place he might prosper. Even if the Yankees get another .400 OBP from their leadoff man, Cano would be batting with the bases empty 60 percent of the time, do his best hitting, and be on base for Mark Teixeira, A-Rod, et al. The downside is that you might get a few extra Cano double-play specials when the leadoff man does reach base.
The following table shows how often Yankee batters came to bat with the bases empty for the different lineup positions in 2009:
+------------+-----+--------+------------+
| lineup_pos | PA  | noneOn | noneOnFrac |
+------------+-----+--------+------------+
|          1 | 785 |    496 |     0.6318 | 
|          2 | 772 |    392 |     0.5078 | 
|          3 | 753 |    358 |     0.4754 | 
|          4 | 732 |    341 |     0.4658 | 
|          5 | 715 |    356 |     0.4979 | 
|          6 | 699 |    402 |     0.5751 | 
|          7 | 682 |    371 |     0.5440 | 
|          8 | 668 |    361 |     0.5404 | 
|          9 | 643 |    336 |     0.5226 | 
+------------+-----+--------+------------+

The last column of this table shows the fraction of plate appearances that occur with nobody on base. Now, focus on the #2 and #7 positions: actually, the #2 slot batted with nobody on less often than the #7 position (51% compared to 54%). Putting Cano in the #2 slot would actually worsen his overall production (taking at face value the splits mentioned previously.)

Furthermore there is an additional cost of batting Cano 2nd: it likely moves Nick Johnson, you know, the .400 OBP guy, down in the order. (Goldman talks about Johnson possibly leading off and Cano batting second, but the Yanks have a guy who has always batted #1 or #2, fella name of Jeter.) Obviously, batting higher in the order gets you more PA's. A good rule of thumb is that each spot in the order gets 18 more PA's over the season than the next one. Anyway, what difference might we expect in Yankee offensive production if Cano bats second and Johnson bats seventh and vice versa? Well, it turns out the difference is quite small (I'll spare you the details), batting Cano 2nd and Johnson 7th would cost the Yankees about a run over the course of a season.

So, moving Cano up in the order will not really accomplish anything for the Yankee offense. In the end, careful analysis of batting order scenarios invariably leads to the conclusion we have here — it just doesn't make much difference. Goldman himself said it nicely:

First, many studies suggest that the difference between the optimal batting order and the least-optimal batting order is quite small. That said, there is a difference, and even if it’s as little as one win a season, you never know when you might need that one win.

It's just that the "win" he mentions is more often a small fraction of a win, but it's still true that you never know when you might need that one small fraction of a win.

Monday, February 1, 2010

What'd I do to deserve this?

Everybody googles themselves once in a while, right? I do it, too, but I use Google Alerts to help me out. Basically, Google Alerts sends me an email whenever it sees somebody refer to me out there on the Web. So, the other day, I get something from Google Alerts — Dave Cameron over at FanGraphs started a thread called "The Sabermetric Library" in which he asked readers to list in the comments section their favorite "influential sabermetric articles".

There have been 47 responses so far, one of which mentioned me by name -- hence the Google Alert. The kind soul, who uses the moniker "vivaelpuljols", who referenced my work wrote:

John Walsh’s building blocks of Pitch f/x work:

http://www.hardballtimes.com/main/article/fastball-slider-changeup-curveball-an-analysis/

http://www.hardballtimes.com/main/article/pitch-identification-tutorial/

http://www.hardballtimes.com/main/article/the-eye-of-the-umpire/

http://www.hardballtimes.com/main/article/how-fast-should-a-fastball-be/

http://www.hardballtimes.com/main/article/searching-for-the-games-best-pitch/


Bless your soul, vivaelpujols! I think the above pieces are some of my best, so it's good to see the shout-out. Then I saw something that tempered my satisfaction a bit. At FanGraphs, readers can rate individual comments, giving them either a "thumbs up" or a "thumbs down". And I saw that vivaelpujols excellent suggestion had received 0 upward-pointing thumbs and 3 downward-pointing ones. WTF? What'd I do to deserve that?

I learned later that vivaelpujols is actually a guy named Nick Steiner, who is a regular writer at the Hardball Times. I don't contribute to THT much anymore, just don't have the time, but I do follow the site fairly closely and Steiner is one of the better analysis-oriented writers over there. And he has great taste in others' work.

UPDATE: Somebody gave the first "thumbs up" to the comment in question. Thanks, Dad.

Friday, June 12, 2009

Hit-f/x puzzle

The following plot was made using the hit-f/x data recently made available by SportVision. Plotted along the horizontal axis is speed of the ball off the bat (in mph). The vertical axis shows the launch angle (or vertical angle): i.e. the angle of the batted ball relative to the ground.



I must confess, I don't understand the features of this graphic. Specifically, I don't understand why there are plenty of slow ground balls (speed <> 0 degrees). Maybe because of the geometry of the ball-bat collision and the non-horizontal bat angle, balls hit in the air at low speeds go foul? (Foul balls, unless they are caught for an out, are not included in the hit-f/x batted ball data.)