I pulled the data from the 2008 results for a little amateur analysis. Primarily I am using this to establish my goals based on others' performances in the race. I know that I won't finish first, but I don't want to finish last either. This can help me produce a realistic measure of how well I can expect to perform.
As you can see, it is a fairly normal distribution, slightly skewed to the right. My goal is to finish the race in the top 50%, or about 6 hours. If I were to calculate the average (median) for each leg of the race, it would be a 44:08 swim, a 2:56 bike (19 MPH average), and a 1:59:44 run (9:09 minute mile). I think that I can do that. Perhaps a little faster on the run, but it's hard for me to say.



Quick conclusion here. (I can't get any specific correlation scores because I only am working from Excel ... t-score, maybe? High school was so long ago ...) There is a fairly low correlation during the participants' place during the swim and their final position in the competition, except at the extremes (i.e. the first and last places).
First, a look at the curve of average times:
As you can see, it is a fairly normal distribution, slightly skewed to the right. My goal is to finish the race in the top 50%, or about 6 hours. If I were to calculate the average (median) for each leg of the race, it would be a 44:08 swim, a 2:56 bike (19 MPH average), and a 1:59:44 run (9:09 minute mile). I think that I can do that. Perhaps a little faster on the run, but it's hard for me to say.Now, this brings up the question of what training is most important. What I'd like to do is to look at the correlation between the final position of each contestant compared to their performance in each individual leg. So it should stand to reason that I should focus more of my time on the leg with the most correlation to their overall position in the race.
Scatter plot time:



Quick conclusion here. (I can't get any specific correlation scores because I only am working from Excel ... t-score, maybe? High school was so long ago ...) There is a fairly low correlation during the participants' place during the swim and their final position in the competition, except at the extremes (i.e. the first and last places).
The correlation when you get into the cycling and running portions becomes more pronounced. It looks stronger in the cycling than in the running, but that is entirely observational, not scientific. This would make sense considering cycling is the longest section, both by distance and by time.
It then stands to the voice of reason (or at least to the little voice in my head, which is often entirely unreasonable) that more time should be spent training on cycling than on the other two, then on running, and then on swimming.
What else is interesting about this data is that the overall winner did not win any one of the categories, although I am going to bet that this is due to faulty data. For example, the data set says that the fastest runner was a 46-year-old Utah woman who ran 4:50 miles, although she finished in 684th place and only biked at 13.4 MPH. (This would mean that until the running portion, she was in 943rd place - 5 spots ahead of last place - and then burst ahead at sub-five-minute miles, passing a total of 359 people before crossing the finish line. If that were the case, she would have shattered the women's record for a half marathon by more than 3 minutes. Call me a cynic, but I think something's going on there.)


