<$BlogRSDUrl$> <body><script type="text/javascript"> function setAttributeOnload(object, attribute, val) { if(window.addEventListener) { window.addEventListener('load', function(){ object[attribute] = val; }, false); } else { window.attachEvent('onload', function(){ object[attribute] = val; }); } } </script> <div id="navbar-iframe-container"></div> <script type="text/javascript" src="https://apis.google.com/js/plusone.js"></script> <script type="text/javascript"> gapi.load("gapi.iframes:gapi.iframes.style.bubble", function() { if (gapi.iframes && gapi.iframes.getContext) { gapi.iframes.getContext().openChild({ url: 'https://www.blogger.com/navbar.g?targetBlogID\x3d5774626\x26blogName\x3dCollege+Basketball\x26publishMode\x3dPUBLISH_MODE_BLOGSPOT\x26navbarType\x3dBLUE\x26layoutType\x3dCLASSIC\x26searchRoot\x3dhttp://collegeball.blogspot.com/search\x26blogLocale\x3den_US\x26v\x3d2\x26homepageUrl\x3dhttp://collegeball.blogspot.com/\x26vt\x3d6980192687323097252', where: document.getElementById("navbar-iframe-container"), id: "navbar-iframe" }); } }); </script>
yoco :: College Basketball
(a sports weblog) news and commentary on men's college basketball and the ncaa tournament

yoco :: College Basketball has a new home! If you are not automatically redirected to http://www.yocohoops.com in 5 seconds, please click here.

Wednesday, December 08, 2004

stats continued

Didn't want to post this last night above the great game thread. I guess Steven Graham has joined Dickie's "special player" pantheon, huh? Anyway, to get us started, here are what I see as major areas of difficulty with basketball stats, with some examples (some of this you've probably thought of before, so if you don't feel like reading it all, you could skip to Creating a Summary Stat).:

Misguided Use of Existing Stats:
As an example, a pet peeve of mine -- I never want to see field goal percentage ever again. Adjust it for the fact that some shots count for two points, and others count for three. It's been that way for a few decades now, and combining it all into FG% is an anachronism that must die. Don't make me do the math in my head. Use Points per Field Goal Attempt (see a great study on NBA shooting using this stat here) or adjusted FG% (just PPFGA divided by 2). A little under half of Terry Dehere's career shots at Seton Hall were 3-pointers. Adjust his stats accordingly, and his career 43.8% FG% (and 38.8% 3FG%) becomes 1.05 PPFGA or a 52.3% adjusted FG%. Marcus Camby hit one 3-pointer in his three years at Umass, so his career 50.1 FG% and 1.02 PPFGA stay almost right where they are, and suddenly you can compare the two players in a meaningful way.
In baseball, there are old-school announcers and writers who still talk about batting average as if it were enough to identify a great hitter, even though on-base percentage and slugging percentage are much more significant. But in recent years I've started to see OBA and SLG show up on ballpark scoreboards and TV stat-lines, and I'd love to see PPFGA or AFG% start to replace or supplement FG% as mainstream measures of efficient shooting.

Strategy, Position, and Bias:
In baseball, every team employs a shortstop and a first baseman. On every team in the league, the shortstop will have more assists, and the first baseman will have more putouts. In college basketball, many teams play the 2-3 zone. The man in the middle of the zone will inevitably have the most opportunities for defensive rebounds, while the players up top will have the most chances for steals. But in baseball, most players stay at the same position for an entire season, and every team plays the same positions, so we can do separate evaluations for shortstop and first baseman. In basketball, every team plays different variations of the available strategies and likely employs multiple strategies within the same game; meanwhile, the success of each player is much more closely tied to the actions of his teammates. So it's far easier to isolate the defensive stats of Derek Jeter and John Olerud than it is to isolate the stats of Craig Forth and Gerry MacNamara as compared to their counterparts. I've talked about defense here, but the offensive comparison is even more stark -- strategy defines basketball offense, determining who gets to take a shot and who spends their time setting screens, while in baseball (putting the occasional ill-advised sacrifice bunt aside), every player gets an equal chance to demonstrate their value. Meanwhile, as John Hollinger has pointed out, even offensive rate stats like assist/turnover ratio are biased toward certain types of players and strategies. Howard Eisley consistently has a high A/TO, but that's because he never penetrates and thus rarely loses the ball or creates a wide-open shot for a teammate, instead picking up cheap assists because as a "point guard" he gets a lot of touches. Furthermore, the assist is roughly as biased as the RBI -- both correlate to doing something good, but are largely dependent on teammates' action. Just like not all RBI are created equal (a ground-out with a man on third vs. a solo home run), not all assists are created equal (a chest pass to an open man who drains a 3 vs. Larry Bird threading the needle for an open layup).

Process vs. Outcome:
On my last post, Bret had a great comment regarding the possibility of video study to identify strategic successes and errors by individual players (i.e., good plays like a successful rotation or a well-set screen that don't show up in the stat-line or on the scoreboard). Coaches at both the pro and college level do quite a bit of this, of course -- Yoni and I got to visit the Celtics' training facility a few years and meet Jim O'Brien's video man, who had gotten his start as a student manager for Pitino at Kentucky, and broke down video as a full-time job. But as observers of the game, we can't really hope to duplicate this sort of video study, as the specific knowledge required and judgment calls to be made would be overwhelming. In baseball there's been a push lately to record the process of a play, not the outcome: for instance, suppose Barry Bonds smokes a line drive on a trajectory to clear the center-field wall by a few feet, but Torii Hunter makes an absurd leaping catch. In the box score it's an out for Bonds, but process-based stats (not too hard to record in this case) would reward him just as much as if the ball cleared the fence over the head of a weaker defender. I know Pitino/O'Brien teams have tried recording more process-based stats like deflections and touches instead of steals and assists, but I don't think this sort of analysis has a great future for stats in the public domain.

Creating a Summary Stat:
As far as I can tell, the composite stats that are out there, like PER in the pros and Tendex in college (thanks, Dave) all do fairly similar things. They take the usual box-score stats (points scored, FG, FT, FGA, FTA, rebounds, assists, etc. etc.) and weight them based on an estimated contribution to a team's points from historical data to generate a single estimate of a player's value. This is certainly useful, generates some interesting numbers, and is actually pretty similar, from my understanding, to how baseball's Win Shares/Runs Contributed are calculated.

I see two problems with this approach: first, any derived stat like this can only have as much explanatory power as its constituent parts -- if measures of assists or rebounds or steals are biased toward types of players or strategies, then so will be the composite stat that uses them as inputs. Second, the choice of points as the keystone stat (though some versions of Tendex estimate points per possession) is not nearly as obvious as the choice of runs in baseball. As some commenters noted last time, the pace of the game, and thus the number of possessions on which points can be scored, is largely determined by strategy. In baseball, there are always nine innings, and almost never a disincentive to try to score as much and as soon as possible (ignoring impending rainouts); in basketball, there is often a reason to slow down and reduce the number of possessions, whether to hold a lead or to counteract the other team's strategy.

Points per possession (offensive and defensive) seem to me the best solution as a keystone stat. Note that I'm defining possession as changing on a made field goal (or terminal free throw), a defensive rebound, or a steal; an offensive rebound does not result in an additional possession. There are some very appealing features to this stat: first, it is very feasible to derive it from play-by-play data (man, I wish there were a historical Retrosheet for basketball). Second, it ought to capture all the things we've talked about above as hidden parts of the game -- only the actions that increase your likelihood to score (or prevent the other team from scoring) on a particular possession really matter. Third, it's a stat we can compare fairly between teams that play very different styles, or at very different paces. Adjusting for strength of competition is another matter, but there are already some algorithms that do this pretty well. Now here's the key difference: instead of trying to predict PPP from box-score stats, we can just measure it directly, and statistically identify players' contributions.

Next time I'll talk some more about why I like PPP, statistical methods for determining individual player contribution to PPP, and some other cool adjustments we can make.

If you've read this far, congrats and thanks. I'd appreciate any input you have.