Tuesday, March 17, 2009

The Fallacy of Scouting Reports

It's obvious I'm a stat guy. I won't go so far as to say that you can know everything about a player just by looking at his stats, but numbers can tell you a lot -- and I mean a lot -- about a player. Even just the basic stats like contact rate, strikeout rate, walk rate, and groundball rates can give insight into a player's mechanics, speed, mentality, maturity, hand eye coordination, and even the late movement on a pitcher's fastball. After looking at thousands of stat lines, these types of attributes pop out at me more and more, and prove to be false less and less.

As much as I like stats, it's nice to take a moment to occasionally read a scouting report or 2 about a player. Sometimes I read scouting reports looking for certain information, as well as affirmation about my stat based analysis of a player, but I also read scouting reports that might help explain certain statistical anomalies that I can't attribute to any particular element of a player's game.

The problem with scouting reports is that unlike stats, they can have such a wide variation and observer bias that it's very difficult to determine which reports are accurate and which reports are worth ignoring.

Just as an example of why I shy away from scouting reports, here are 2 scouting reports of Greg Maddux when he was in high school. The first report was written by a Cubs scout named Jorgensen on April 20, 1984.

Not really a bad report, but if I were to read that report about a high school pitcher being considered in the 2009 draft, I'd probably not think much of him, and I'd move on to find someone with more promise. Nothing in the report would make me think that this could be a special pitcher, especially when you see that the scout has rated all of his pitches and skills as a '5' or a '6' with a future projection of '6' and one '7'.

But check out this scouting report written just 1 month later by a scout named Mapson.
All of a sudden, in the space of 1 month, Maddux goes from having a bunch of '5' pitches with a '6' potential to having anywhere from a '4' to a '6' with the potential to have an '8' fastball, a '7' curve, with a few more catergories receiving a future '7'. Instead of being a "hardthrower" who lacks "overall control", he's now good enough to be the #1 pick in the nation, with a big league curveball.

So, what happened? Did Maddux improve so much in one month that he went from being an above average high school pitcher to a possible #1 pick? Did his curveball suddenly start breaking an extra foot, and his fastball start moving at the last second? My guess is no. Maddux was the same pitcher in May 1984 that he was in April 1984.

In retrospect, it looks like Jorgensen wasn't such a keen observer, and Mapson was some kind of baseball prophet. But in reality, the difference was a variety of things, including the individual writing the scouting report, the conditions and competition, and the length of observation among other things. That's why the numbers don't match up, and that's why one report describes a good pitcher, while the other describes an elite pitcher.

For all I know, Jorgensen may have gone on to great things as a scout, and Mapson might be flipping burgers in Vegas somewhere. I don't know the whole story, so I can't really say. But what I do know is that people have opinions, and not everyone sees things the same. I might see a kid with an attitude, you might see a kid with spunk. I might see a kid with an average breaking ball that falls behind hitters, and you might see a kid with good arm action and a great mental approach. Over the course of a small sample size (like 5 innings of a high school or college game), who knows what a scout might see, and beyond that, who knows what a scout might think they see.

I'm not picking on scouts, but rather the process of scouting. I think that in most cases for most teams, it is severely flawed, and allows for heavy personal and group biases. For instance, take the following comments from scouting reports that several big league organizations used prior to the 2006 MLB draft. I'll first give you the reports, then the name of the player under examination.

"It looks like his head is going to snap off and his arm is going to fly off."

"He [is] short, not a real physical kid, and mechanically he [is] going to break down..."

The player in question? Tim Lincecum.

In 2006, 9 teams passed up the chance to draft Tiny Tim, instead taking Luke Hochevar with the #1 pick, followed by Greg Reynolds, Evan Longoria, Brad Lincoln, Brandon Morrow, Andrew Miller, Clayton Kershaw, Drew Stubbs, and Bill Rowell. Of the 9 players chosen ahead of Lincecum, only Longoria and Kershaw appear to have comparable talent, and none of them have had such an amazing impact on the game so early.

Obviously, in hindsight it's easy to see that Lincecum should have been at least in the top 3 picks in the 2006 draft. But at the time, teams felt like they had to look out for themselves, ignored the stats, and passed up the opportunity to draft a proven top tier talent simply based off of size and some poorly worded scouting reports. Perhaps those very same teams had never heard of Pedro Martinez, Tim Hudson, or Roy Oswalt, all of whom top out at 5'11" or 6'.

If Lincecum's draft position was the only time that teams made a mistake due to scouting reports, it wouldn't be such a big deal. But according to an article written by Lincoln Hamilton at Project Prospect, of the college hitters drafted from 2001 to 2005 the factor that had the least correlation with a hitters success at the major league level was draft position. That means that of all the college hitters drafted in that space of time (within the top 50 picks), the later a hitter was drafted, the more likely he was to become a productive major league ballplayer.

What does that have to do with scouting reports? Almost everything. Due to the wide variation in competition levels and sporadic stat keeping at the college level, the de facto mode of evaluating college players is primarily through scouts and scouting reports. And according to Hamilton, scouts were so bad at evaluating talent from 2001 to 2005 that teams actually drafted the worst players first.

It would be on thing if only a few factors better predicted the major league success of college hitter, and draft postion were somewhere in the middle of all of them. That would mean that in general, scouts were doing an average job of evaluating talent. But the fact that of all the factors out there, draft position was dead last (and in Hamilton's study it wasn't even close), which means that scouts were so bad at their job during that time period, a team would be better off drafting the exact players their scouts told them not to, and avoiding the players that scouts liked the most.

As one final example of why I find scouting reports so difficult to accept, I'd like to point out a basic review of scouting reports performed by Jeff Sackman and Kent Bonham at The Hardball Times. By comparing the statements made in various scouting reports of several college players with actual statistical data, Sackman and Bonham were able to gauge the accuracy of each scout's assessment. Though it was a small sample size (only 4 examples were analyzed in this particular publication), the results were quite surprising. Sackman and Bonham found that 75% of the scouting reports were not only inaccurate, but completely inaccurate. Players that scouts felt were spray hitters were actually pull hitters. Pitchers that were reported to be groundball pitchers were actually fly ball pitchers.

What does it all add up to? I'm not exactly sure. But one thing I do know is that the more scouting reports I read, and the better I get at analyzing stats, the more I've begun to listen to the numbers and not the scouts. I don't personally have anything against scouts and the work they do. I just don't believe everything (or most things) they say.

No comments: