See you in October

That title could probably be a rallying cry for just about any Major League Baseball organization, but in this case, it means this blog is going on hiatus. Given my random publishing schedule, most of you probably won't even notice that I'm gone.

As a part of my employment as a pro scout, I have been asked to cease blogging activity from now until October 1, 2011. The blog will not be taken down.

The PITCHf/x database will continue to run as it has for the past couple of seasons. Of course, thanks to this off-season's upgrades, it's now much prettier.

I'll be doing the normal scouting stuff, with half the number of teams of most scouts (to help me balance school and other work responsibilities), and I'll also be doing some video analysis. There's no telling what kind of impact I'll have, but I certainly hope it's big enough that my baseball career isn't one-and-done. Either way, it's going to be a fun season.

As you read this, I'm headed to (or already in) Phoenix, Arizona for Spring Training. My pro scouting adventure has begun.

Measuring pitch variability through PITCHf/x

For a while, I've been wondering what can be measured and analyzed using PITCHf/x data that hasn't already been measured and analyzed. A few things crossed my mind, but the most interesting thought was about the degree of variability of a pitcher's pitches.

It would be relatively easy to measure how much variability a pitcher has in velocity and movement if all things were equal. Of course, they aren't.

The two biggest problems for analyzing this type of variability are, as I see them, pitch type identification and park-to-park measurement error. Variability would mean little if half of a pitcher's "two-seam fastballs" are actually change-ups. Variability also runs into problems when parks like Kansas City -- whose radar gun readings are notoriously high -- are included in a data set with other ballparks.

Fortunately, if we only look at a single ballpark -- usually the pitcher's home ballpark because it has the greatest sample size -- park-to-park measurement error should be less of a factor. Without some form of park-to-park normalization, though, interpark comparisons shouldn't necessarily be taken at face value.

Additionally, 2010 saw a huge improvement in pitch type identification. While it still isn't 100% accurate, it is close enough on many pitchers to give me confidence while playing around with my ideas.

I haven't really dug into the numbers yet, but I will be looking to see if variability within a pitch type helps or hurts a pitcher. My gut feeling is that the number itself won't have much meaning.

To calculate the variability, I plan to capture the 95% window using two measurements of the standard deviation in both directions from the mean. By definition, this eliminates the outliers, but it will take some study to determine if that's really the measurement to use.

It may be beneficial to use a pythagorrean measure to find the variability for pitch movement; however, this would not appropriately model pitches that have greater variability vertically than horizontally (and vice versa, of course).

Look for a follow-up after I play around with this idea.

Thinking about run values

For some time, I've been looking for a way to appropriately integrate run values into the PITCHf/x database. I have read articles at Beyond the Boxscore, Inside The Book, and Cubs f/x, but I am no closer to getting what I want. Unfortunately, I lack the resources and time to find the answers myself.

Many tables have been published with run expectancies for the 12 ball/strike count states for various time periods. Tables have also been published for the 24 base/out states. Because the two tables contain different representations of the same data, there's no way to combine them. What I would like to see -- and I'm sure this makes me a sadist -- is a run expectancy table for the 288 ball/strike/base/out states.

Yes, that's one hell of a matrix to process, but there are two thougts that seem to be the beginning of arguments against the two relatively simple approaches:

  • The thought against only using the 12 ball/strike count states table: a first-pitch strike in a bases loaded, no out situation has to effect the run expectancy more than a first-pitch strike in a bases empty, two out situation, right?
  • The thought against only using the 24 base/out states table: an 0-2 single with a runner on first base has to effect the run expectancy more than an 3-0 single with a runner on first base, right?

Admittedly, I don't have the knowledge or skills necessary to issue either of those thoughts as facts, so I have posed them as questions. It seems logical, though, doesn't it?

I think an appropriate time period for the analysis to cover is 1998-present -- since the last expansion.

Does anyone know if anyone has tackled this subject, successfully or otherwise? Is this covered in a book that I have not yet read -- possibly even one that I have read?

Consider this an open call for help in this matter.

A new PITCHf/x chart

For a long time, I've been frustrated by spin movement (Magnus effect) charts because they don't genuinely show how much a pitch actually moves. These charts perfectly demonstrate how the spin of the ball changes its path, but they don't show how velocity adds a vertical element to the pitch's movement.

Take this chart for example. These are the pitches thrown by Texas Rangers LHP Derek Holland during September and October of last season.

Texas Rangers LHP Derek Holland's pitches.

Texas Rangers LHP Derek Holland's pitches.

Even though they are much slower pitches, Holland's change ups are located in the exact same place on the graph as his fastballs. If his fastball and change up start with the same trajectory, the change up will always cross the plate lower than the fastball. I wanted to capture this on a chart, so I put gravity back into the equation.

Using Gameday's physics data (initial position, initial velocity, acceleration), I calculated how long each pitch was in the air. Keep in mind, though, that PITCHf/x starts at 50 from the plate and ends just in front. The mapped data covers only about 48 1/2 feet.

With the flight time for each pitch, I calculated the drop caused by [sea-level] gravity. After converting this number from feet to inches, I added the vertical spin movement. Here's how it turned out:

Texas Rangers LHP Derek Holland's pitches on the gravity chart.

Texas Rangers LHP Derek Holland's pitches on the gravity chart.

Success. The change ups now appear below his fastballs. The chart reflects not only gravity's effect on a pitch, but it also helps separate pitches by velocity, making identification a little bit easier.

This chart does not replace virtualizations by any stretch of the imagination, but I think it does show how different two pitches can be from each other even when spin movement alone can't show it. Taking this a step further could lead to a "hitter's decision" chart that would represent how different the pitches look at a certain time or distance from the plate.

The gravity charts are now available for all pitchers in's PITCHf/x Database.

[[Update: On April 24, 2010, the Spin Movement w/Gravity charts were updated to reflect gravity's effect from y = 40 to y = 1.417. This change was made based on the information that can be found at Alan Nathan's PITCHf/x site: MLB Extended Gameday Pitch Logs: A Tutorial]]

Some news and updates for Fall '09

It's been quite a while since my last post, but new stuff is on the horizon. The transition back to college life has been interesting, and I'm finally settling into a schedule that will allow me to update with better regularity.

Part of what has kept me from updating is my work on my PITCHf/x tool. It's still under construction, so you'll see holes and bugs in a couple of places. New stuff will be added to that whenever I can find time to work on it. I've got a lot left in the tank for this.

I have also been working with the UT Dallas baseball team as an assistant pitching coach. Fall workouts are now over, freeing up about 20 hours a week for me to write.

In addition to my work with the baseball team, I've started serious strength training for the first time in my life. That's not to say that I've never been on a strength program before, but those previous plans lacked proper programming and weren't designed with any expertise.

To take nothing for granted, I started at the bottom. Kyle Boddy, of, plugged Mark Rippetoe and Lon Kilgore's Starting Strength, and I dove right in. I made a few small alterations to the basic workout plan, and along with a few small dietary changes - added lots of milk and an extra meal consisting of 2 peanut butter sandwiches (jelly optional) - I've been pretty impressed with my results to this point.

This winter, I will also be looking into NSCA's CSCS (Certified Strength and Conditioning Specialist) certification. Hopefully, my brain can keep pace with my ambition.