"He uses statistics as a drunken man uses lamp-posts... for support rather than illumination." -Andrew Lang
I finally have begun getting into the massive college football data dump from last summer both to try to discover new things and judge whether the methods I currently tend to use are indeed the right ones. I'll give you an example.
I tend to use averages a lot, as does most everyone who works with sports statistics. Averages work best if what you're taking averages of has a normal distribution. If the distribution is skewed to any great degree, then using an average is as helpful as you might think.
So for starters, I plotted out the frequency for ranges of points scored in I-A vs. I-A games from 1998-2010. This is what you get for that:
It kind of looks like a bell curve, but it's awfully jagged and kind of leans to one side. Those variations from what you see in a classic normal distribution could have important implications for the study of football.
They also could be the side effect of a poorly constructed graph. If you look at the X-axis, the point ranges are in groups of five. While that might be good for, say, basketball, it's not good for football. Points are most often scored in groups of three or seven in football, and two field goals are together almost seven. Here is a better graph where point frequencies are grouped by seven instead of five:
That's much better. The scale of the graph better matches the underlying activity that produced the numbers, and we see a distribution that looks a heck of a lot more like a normal distribution. Taking about average points per game scored or allowed does indeed make sense given that the distribution of points scored is roughly bell-shaped.
That conclusion is not the only important one here, though. You can see that the left side of the mean has a much steeper slope than the right and also that the boundary of zero is quite important. I think it's fair to say looking at this distribution that, if it was possible to finish a game with negative points, a noticeable amount of games would end up with at least one team below zero on the scoreboard.
That point is important because of the fact that we often try to compare the best offenses to the best defenses. There is no upper boundary to the number of points an offense can score. In theory, two teams can go on scoring forever as long as they match each other point-for-point in overtime periods. On the flip side, a defense cannot by rule allow fewer than zero points no matter how well it plays. The zero bound is a very real factor there.
I am not a good enough mathematician to come up with a nice, elegant solution for comparing offenses and defenses on equal grounds (though some people have made good tries to that end). However, it's an important distinction to bear in mind when going through this season's most contentious debate regarding the relative merits of Alabama and its fierce defense versus Oklahoma State and its explosive offense.
For that matter, points scored and points allowed measure two completely different things. There are obviously some mitigating and confounding factors, but generally speaking, points scored measures how often and to what degree an offense succeeds whereas points allowed measures how often a defense fails. Success rates and failure rates are two different things.
In the coming days, the Bama vs. Oklahoma State fight is going to be fought here and there around the world of sports media. It will only get more intense if Alabama defeats LSU in the BCS National Championship Game.
If you plan to take and argue a side, I plead with you to use statistics responsibly.