Tuesday, October 14, 2014

MLB Productivity Analysis: 1901-2013

In the spirit of the MLB Postseason being underway, I thought it would be interesting to do a viz looking at MLB Player Productivity from 1901 - 2013. This viz looks at productivity in terms of getting Rbis (runs batted in). The more runs a batter brings in, the better chance their team has at winning a game. I looked at player productivity in a couple of different ways: how hits correlate with Rbis and how homeruns correlate with Rbis. It was interesting to find out that neither getting the most hits or getting the most homeruns means that you are more productive than other players. For example, in 2004 Ichiro Suzuki set the single season record for the most hits in a season with 264 and he only had 60 Rbis. This is most likely because he was the lead-off batter for the Mariners that year. Also, in 2001, Barry Bonds set the single season record for the most homeruns with 73 and had 137 Rbis that same season. That means most of his homeruns were likely solo shots. I found that the best measure for player productivity is batting average. The players with the higher batting averages tend to generate more Rbis than the power hitters.

 I also looked at how the American League compares to the National League in terms of total Rbis as well as batting average. Historically, the American League players tend to generate more Rbis than National League players. If you look at the average player batting average for each season, both leagues tend to be pretty close. The most interesting thing that I found was how batting averages correlate with strike out percentages. It seems that pitchers are getting better and better as time goes on. Since 2008, on average, players are striking out more that they are hitting the ball! This could really hurt batters in terms of productivity.


Take a look for yourself... I found this project to be pretty interesting...

Tuesday, September 23, 2014

College Sports...Big Business


First of all, I want to give a HUGE shout-out to my colleague Matt Chambers (aka Sir Viz-a-Lot) for winning the Viz of the Day on Tableau Public! He and I both had the opportunity to attend the 2014 Tableau Customer Conference in Seattle. I have to say, I was completely blown away and would recommend anyone who loves data to try to attend the 2015 Tableau Customer Conference in Las Vegas. You will not find a better place to meet and network with other data enthusiasts.

Not to sound like a broken record but...

The best part about the conference for me was coming back home inspired. At the conference, we decided to attend a breakout session put on by Andy Kriebel (fellow Clemson grad which is AWESOME!), Jewel Loree (she loves data so much she wears it!), and Peter Gilks (a true Tableau Zen Master!). The session was all about data blogging. The presenters are, in my opinion, among the best data bloggers in the world! It made me realize that there is so much data out there that is free, transparent and just waiting for someone to find insight in it.

Although I am not a newbie to Tableau Desktop and Tabelau Server, I am new to Tableau Public. I had no idea that it was such a great resource for dashboard ideas and inspiration. There are SO many creative visualizations out there.

Since I have had some free time lately I decided to work on my first Tableau Public viz. This viz focuses on how for many schools, college are a HUGE money-maker. I decided to look at the data in three different ways: revenue per player, expenses per player and profit per player. These attributes are visualized by conference, on a standard score distribution by college as well as a heatmap by college. The standard score distribution (z-score) lets you see how schools compare in terms of revenue, profit or expenses by player, to other schools in their respective conferences. Any school outside the one standard deviation range is considered to be a unique case; they are either spending, bringing in, or pocketing more or less than the average school in their conference. There are parameter controls built in to allow the user to filter by each attribute as well as by a specific season (2000-2012) or by sport. It is interesting to see how different sports are valued among the different conferences. Check it out below!

Enjoy...

Data source: https://github.com/AragornTech/ncaa_data/tree/master/doe_2000_2013