Wednesday, March 13, 2013

Rabbit Maranville, Mr. RBI

Why him? The following Bill James formula predicts his RBIs better than any other player:

RBI = (TB/4) + HRs

It predicts he would have had 883.75 RBIs while he actually had 884. For every 700 PAs, or about a full season, that is only off by +.016. That is the most accurate prediction for all players with 5,000+ PAs from 1876-2012 (I used Baseball Reference and RBIs might not be available for all pre-1900 years). Click here to see the rankings. The rankings are arranged by how much over or under a player was predicted.

Cap Anson was predicted to have 77.24 RBIs per 700 PAs while he actually had about 130. So he gets +52.76. Of course, this does not mean he was necessarily a great clutch hitter (although he could have been-he did lead the league 8 times in RBIs according to Baseball Reference and if you notice, he is 7 RBIs ahead of the next best guy, so he looks like a bit of an outlier). But his team led the league in OBP several times back then and in other years was often near the top.

So what might be going on with Anson? For one, he did not hit many HRs (just 97). But no one did back then so you had low HR guys batting in the middle of the order, where you would get more than the average number of RBI opportunities. Second, he might have played in some years when the league OBP was high. Third, more players reached on errors back then, creating even more opportunities.

Over the last 10 years, the formula has predicted about 20 more RBIs per team each year in the AL than they actually got. In the NL, it is about 25 more. So the prediction is coming in around 3% too high. Again, we are in a low error period, so not as many runners are reaching on errors as in other period's in baseball's history.

In, fact there is a high correlation between how often runners reach (by whatever means) and the size of the prediction error for a whole league in any given year. I added the OBP each year to the error rate (ERATE) each year (ERATE is 1 - fielding percentage). That sum was then correlated with how big the prediction error was per team (the more teams you have the bigger the error might be). For all of NL history, that correlation is .87 and for the AL it is .85. So years when an entire league had more RBIs than predicted it most likely had alot more baserunners than normal, by hits, walks, HBP and errors.

Now getting back to Maranville, he tended to bat leadoff, 2nd or 7th. Hardly great RBI slots. So you might expect him to get less than the number of RBIs expected. But he did play mainly in the 1920s and 30s, when OBPs were high and the ERATE was higher. He also hit well with runners on base. Retrosheet only has about 1300 of his 8800 career ABs broken down for this. But with none on he batted .277. With runners on, .317 and with runners in scoring position, .324. Click here to see his splits.

If you look at the rankings from the first link, you can see that many of the batters who had the biggest negative differentials (meaning they got fewer RBIs than expected) were leadoff men.

This formula may apply best to power hitters who bat in the middle of the order. So I also looked at how well the formula predicted for all players with 300+ career HRs. Click here to see that link. The guy that jumps out there is Al Simmons. He got about 25 more RBIs than expected per season and the next highest is Greenberg at about 18.

Now Simmons batted 4th most of his career (especially with the A's) and had Max Bishop leading off alot of that time. Bishop had a career OBP of .423. The 2-3 hitters probably averaged around .365. So he had alot of opportunities. But Retrosheet has an even smaller number of on-base splits for Simmons so it is hard to tell if he was a clutch hitter.

I was surprised to see Willie Mays so far down. He had 15 fewer RBIs than predicted per season yet he hit well with runners on.Click here to see his splits. Maybe he got intentionally walked alot with runners in scoring position. Mantle and Barry Bonds are also near the bottom and the same thing could have happened to them. Alfonso Soriano has batted leadoff nearly half of his PAs, so that may be why he is last.

Bill James discusses this formula in his book Solid Fool's Gold: Detours on the Way to Conventional Wisdom

I have published two articles about RBI prediction:

RBIs, Opportunities and Power Hitting

Do Hitter’s Get Their Expected RBIs?

Monday, March 11, 2013

Rich Gossage vs. Mariano Rivera

Below is something I posted in June 2011. Tango just had a post on this issue because Gossage is talking again about his work load being tougher. Tango posted something on this yesterday. See Mo v Goose. Now my post from 2011.

...“I wasn’t a closer, I was a relief pitcher,” Gossage said. He made a great point that he was not just the closer, but the seventh and eighth inning man. He pointed out that he came on with inherited runners in the seventh or eighth inning many times. Some of those situations required that he keep the ball out of play.

Gossage went on to say that “Mariano doesn’t come in with inherited runners. He gets to start out the ninth with nobody on… Easy? It is a piece of cake compared to what we use to do.”
From Baseball Think Factory, quoting an article by Mike Silva.

Yes, relievers were used differently in Gossage's time. From 1977-1985, one of the time periods I will look at for Gossage, most of the top 50 seasons in both saves and games finished were by pitchers who pitched over 100 innings (with only a couple of cases of even 1 game started). From 1997-2005, the period I will look at for Rivera, there were no 100+ IP seasons and even 90+ IP was rare (less than 5 for both stats).

So I want to compare both Gossage and Rivera to the average relievers of their times. I picked Gossage's 1977-1985 years since that seems to be his prime years and he was very good throughout the period. It does leave out his great 1975 season as a reliever (he was a starter in 1976). So for Rivera, I look at his first 9 years as a closer, 1997-2005 (which leaves out a very good 1996 seaon). The fact that Rivera has continued to pitch great since then is a plus in his favor. Gossage supporters might say that Rivera's relatively low IP totals have helped his longevity. Gossage was just average after 1985.

The average relief pitcher from 1977-1985 had an ERA of 3.68 while Gossage had 2.10. If we turn that into a winning pct. using the Pythagorean formula created by Bill James to estimate team winning pct. using runs and runs allowed, we get .754. From 1997-2005, Rivera's years, he had an ERA of 2.04 while the league average was 4.31. That gets us a pct of .817. So Rivera edges Gossage .817-.754. (I checked park factors for each pitcher and the simple average of their teams pitching park factors was the same, 97.56, meaning that they each got a little help from their parks, which were about 2.5% lower than average in scoring). All the data I use here is from Baseball Reference or The Lee Sinins Complete Baseball Encyclopedia.

I also found the top 10 pitchers in saves in each era and then calculated the combined ERA of the other 9 (taking out Gossage and Rivera). The best 9 in Gossage's years had 2.87. That gets a .651 pct. The best 9 in Rivera's years had 3.07, getting us a pct of .694. Again, edge to Rivera.

So far, when being compared to contemporaries with a similar role, Rivera is ahead. But ERA can be misleading, since the fielders play a role here (and ERA may not be the best way to judge relievers who are supposed to come in and put out fires).

To avoid this problem, I am going to look at how each guy comapared to his peers in the fielding independent stats (HRs, BBs, SOs). Then I will convert that into a run value using the values below

HR: 1.40
BB: .33
SO: -.22

Those are the values used in what is called "Fielding Independent ERA" formulas. The table below shows how each guy compared to the average reliever of his time in these stats per 9 IP. For example, Gossage allowed .508 HRs per 9 IP while the average reliever allowed .724. So he was .216 better. Multiplying that by 1.4 we get .3024 (it is negative in the table, meaning how much below average Gossage was). Then this is done for the other stats and for Rivera. The last line shows the combined run value each guy was below average using all three sats.



So Rivera is farther below average than Gossage. If I use the average reliever ERAs from each period, then Gossage gets 2.49 (3.68 - 1.19). Rivera gets 2.76 (4.31 - 1.55). The Pythagorean winning pct for Gossage is then .686 and for Rivera it is .709. The next table does the same thing but only for the other 9 pitchers in the top 10 in saves in each period.



Going right to the bottom line, we can see that they are almost even. Gossage would get a Pythagorean pct of .620 and Rivera would get .611. Very close. Now Gossage may have been better than Rivera, but I think the evidence shows that he should not belittle his greatness. Rivera seems to be at least close to Gossage as measured by how good they were relative to their peers.

One weakness of looking at the others in the top 9 is that park effects and fielders might play a big role since they don't represent the entire league. It is possible that the other 9 guys Rivera gets comapred to pitched in great hitters parks so they look weak in comparison to him. Or maybe Rivera had much better fielders behind him. I have not checked that. And when I did the top 10, it included both leagues whereas when I used the league average, it was just the league they pitched in (for Gossage it was the NL from 1977 and 1984-5 and the AL from 1978-83).

Sunday, February 17, 2013

Rick Reuschel Had Bad Defense Behind Him While Pitching In A HR Friendly Park

Adam Darowski recently had a good article on Reuschel and his Hall of Fame worthiness. See Beyond ERA+: Why Rick Reuschel Had Hall of Fame Value. Adam uses some of the more advanced fielding stats.

Here I will just look at DER (defensive efficiency rating) and fielding pct. But first, I came up with an approximate relative ERA for Reuschel and then converted that into Wins Above Replacement (WAR).

I found all the pitchers from 1920-2011 with 2000+ IPs from the Lee Sinins Complete Baseball Encyclopedia. I also found their ERA, walks, strikeouts and HRs all relative to the league average. I ran a regression with relative ERA as the dependent variable and the others as independent variables.

Here is the equation:

ERA = 34.42 + .264*SO + .24*HR + .179*BB

Here were Reuschel's numbers

ERA 108
SO 95
HR 127
BB 138

The 108 means his ERA was 8 lower than the league average (this is without park adjustments). The 95 menas he struck out 5% fewer batters than average. His predicted relative ERA was 114, good for 64th out of the 293 pitchers. Not a super high rank, but good. He is ahead of Hall of Famers Juan Marichal and Jim Bunning, who each had around the same number of  IP as Reuschel.

If I use the predicted relative ERA for each pitcher and base it on a league average of 4.00 and a replacement level of .400, Reuschel would finish 38th or in the top 13%. He is ahead of Marichal, Jim Palmer, Whitey Ford and Ted Lyons, just to name a few Hall of Famers.

Remember that I am not making any park adjustments. From 1972 –1984, the years Reuschel was on the Cubs, he was 23rd among all major league pitchers with 1000+ IP in HRs allowed relative to the league average. He allowed 25% HRs fewer than the average pitcher would have, pitching in Wrigley Field! Wrigley was a great HR park during this period, compared to other NL parks, allowing 42% more HRs than average.

Now this is all based on defense independent stats. Maybe one reason we don't see how great Reuschel is that the defense behind him was not very good. The table below shows how the Cubs ranked in DER (defensive efficiency rating) and fielding pct for the years when Reuschel pitched alot for them. DER is just what % of balls in play are turned into outs.


They were last 5 of these 9 years and Reuschel only pitched 129 innings in 1972 and well over 200 the rest of the years. They did not do too well in fielding pct, either. This next table shows the simple average for all the NL teams in these years. The Cubs were by far the worst in DER and were still below average in fielding pct.

Wednesday, January 23, 2013

Musial and great power/contact hitters

Dave Cameron addresses this over at Fangraphs. See Translating Stan Musial’s Numbers into 2012 Norms. He compares Musial's isolated power to his strikeout frequency and then converts them into numbers for the 2012 season. He looked specifically at Musial's 1943 season and also his entire career.

A couple of things to remember. That article talks about Musial in 1943. Recall that it was a war year and alot of good pitchers were gone, so that would help him strikeout less and hit more HRs, everything else being equal. And Sportsman's park was a good hitter's park. Musial had a home ISO of .246 and a road ISO of .211 for his career.  DiMaggio had .231 home and .277 away. And notice how close their SO rates were in the table below. DiMaggio had an overall career ISO of .254 while Musial had .228. And notice how close their SO rates were in the table below.

I did a study on this before and I divided a guy's relative HR rate by his relative K rate. Now that is not the same as ISO but my guess is that ISO and HR rate are highly correlated. The guy that really looks great in this analysis was DiMaggio

Musial was 70% better than the league average in ISO and DiMaggio 95% better. So, for Musial we have 1.7/.55 = 3.09. For DiMaggio we have 1.95/.59 = 3.30. DiMaggio had a better relative ISO rate divided by SO rate and he played in a bad park for righties while Musial played in a great park, especially for lefties.

Here is the link

Which Players Had The Best HR-To-Strikeout Ratios?

Musial does well but not as well as some other big names. Now here is all of that post

I looked at every player with 5000+ PAs since 1920. I found their relative HRs and their relative strikeouts. Then found the ratio of the two. Ken Williams, for example, hit 3.70 times as many HRs as the average player of his time and league while striking out only 75% as often as the average player. Since his ratio of ratios (3.7/.75 = 4.93) is the highest of anyone in the study, he is ranked first. The data comes from the Lee Sinins Complete Baseball Encyclopedia. The table below shows the top 25:



DiMaggio hit only 41% of his HRs at home in his career while Williams hit 72%. So it is likely the case that DiMaggio would rank first, and probably by a wide margin, if HRs were park adjusted. Ted Williams hit less than 50% of his HRs at home.

The next table shows which players had the lowest relative strikeout rates among guys who hit 40+ HRs. Again, no pikers here. In 2004, Bonds had only 41 strikeouts while the average player would have had 100. I am so proud to see the demonstration of Polish power with 3 for Ted Kluszewski and 1 for Carl Yastrzemski (whose 1970 season ranks 27th). Don't forget Stan Musial is 13th on the above list.



Update: I found all the guys from 1920-2011 with 5000+ PAs. I also found their ISO relative to the league average and their strikeouts relative to the league average. Then I took the ratios. Here are the top 12

Tris Speaker 4.105263158
Tommy Holmes 4.035714286
Joe Sewell 3.772727273
Frank McCormick 3.102564103
Joe DiMaggio 3.0625
Yogi Berra 3.02
Nellie Fox 2.894736842
Vic Power 2.875
Albert Pujols 2.857142857
Tony Gwynn 2.806451613
Don Mattingly 2.785714286
Stan Musial 2.741935484

The league average in the above case was based on outs or how many outs each guy made. The next one is based on PAs (these are the choices you have using the Lee Sinins data base). So the guy that really stands out is Speaker and DiMaggio still beats Musial

Tris Speaker 4.727272727
Tommy Holmes 4.185185185
Joe Sewell 3.952380952
Joe DiMaggio 3.322033898
Albert Pujols 3.214285714
Frank McCormick 3.102564103
Stan Musial 3.090909091
Yogi Berra 3.081632653
Tony Gwynn 3
Don Mattingly 2.925
Nellie Fox 2.894736842
Ted Williams 2.848101266

Friday, January 11, 2013

Bagwell Is A Clear Hall Of Famer Unless He Used PEDs

He is 36th in career WAR among position players. He twice led the league and 2 other times was 5th. By my count there are 147 position players in the Hall who played in the majors (I am not counting Negro league players only because we don't have a WAR ranking for them).

This puts Bagwell in the to 25% of career value among the top 147. Maybe he will get in eventually, but his gain this year was slight.

Bagwell had 387 career Win Shares. Through 2001, that would have been tied for 56th among all players, including pitchers. Not sure where he would rank now, definitely still in the top 100.

I count 63 pitchers, so there are 210 players + pitchers. Bagwell clearly is above the bar by two different measures, WAR and Win Shares. I doubt both of these measures are so seriously flawed that he should not be in.

Is there some reason to keep waiting to see if proof of using PEDs is found? I don't think so.

Thursday, January 10, 2013

Was Biggio Treated Differently Than Other Members Of The 3000 Hit Club?

It looks like 12 of the last 13 guys with 3000+ hits got elected in their first year of eligibility. The one who did not is Palmeiro. The streak might even be longer since I am starting with Brock in 85. Two of these guys had less career WAR than Biggio (62.1). Winfield (59.4) and Brock (42.8). Murray (63.4) and Gwynn (65.3) were just slightly higher.

Wednesday, January 9, 2013

How Does Piazza's Vote Percentage Compare To Other Great Catchers?

The table below shows the top 10 in career WAR for guys who played 50%+ games at catcher















Carter actually fell to about 33% in his 2nd year and slowly rose to make it. So Piazza did better than Carter but not as well as Berra. Carter jumped to about 49 in his 3rd year and to 64 in his 4th year. He had 72.7 in year 5. We could look at Piazza's vote as pretty normal for a catcher. We don't have to say that he was necessarily affected by the steroids era.

Ted Simmons caught more games than Johnny Bench, Ray Schalk, Bill Dickey and Yogi Berra. More innings than Bench.

The votes for Dickey, Hartnett and Cochrane are a little strange as the rules were in flux. Their vote histories from Baseball Reference are below. What I use for 1st year above is 6 years after their last year, which would be in line with today's rules, although Cochrane got no votes in that year (1943). As you can see below, he did get votes in 1942, so I used that above.

Hartnett jumped 20 points in 1953, from about 39 to 59 and got in the next year

Hall of Fame-Dickey
1945 BBWAA ( 6.9%)
1946 Final Ballot (12.2%)
1946 Nominating Vote (19.8%)
1948 BBWAA (32.2%)
1949 BBWAA (42.5%)
1949 Run Off (20.9%)
1950 BBWAA (46.4%)
1951 BBWAA (52.2%)
1952 BBWAA (59.4%)
1953 BBWAA (67.8%)
1954 BBWAA (80.2%)
Selected to HOF in 1954 by BBWAA

Hall of Fame-Hartnett
1936 BBWAA ()
1945 BBWAA ( 0.8%)
1946 Nominating Vote ( 1.0%)
1947 BBWAA ( 1.2%)
1948 BBWAA (27.3%)
1949 BBWAA (22.9%)
1949 Run Off ( 3.7%)
1950 BBWAA (32.1%)
1951 BBWAA (25.2%)
1952 BBWAA (32.9%)
1953 BBWAA (39.4%)
1954 BBWAA (59.9%)
1955 BBWAA (77.7%)
Selected to HOF in 1955 by BBWAA

Hall of Fame-Cochrane
1936 BBWAA (35.4%)
1939 BBWAA (10.2%)
1942 BBWAA (37.8%)
1945 BBWAA (50.6%)
1946 Final Ballot (24.7%)
1946 Nominating Vote (39.6%)
1947 BBWAA (79.5%)
Selected to HOF in 1947 by BBWAA