Just lately I have been doing a lot of work with speed ratings, testing different theories, and one of the struggles I have had is formulating a query for MySQL that generates the correct median value. You would think that this function would be built-in to MySQL, but unfortunately it’s not, so I had to dig deep to find something that actually works!
The countless number of examples I tested failed in one way or another. The problem is that when the table column contains an odd number of entries, most of the examples I tried worked. BUT, what happens when you have an even number? And in my case the race I was trying to find the median speed value of had 8 runners, so it was imperative I came up with a solution that worked properly on both odd and even number of runners.
I am almost certain this is a problem many people have encountered… and thankfully there is a solution which I will provide later in the article.
But before I do, here’s something you should always remember: FAST HORSES WIN MOST RACES!
I have lost track of the number of times I have discarded the above statement in the past and bet a horse because it had strong trainer stats, or the jockey was riding at the top of his game, or the horse was being backed like winning was not an option. All those factors are well and good, BUT, if the horse can’t run fast enough to actually be a contender then they are all factors that don’t even come into consideration for me anymore. The first thing I check is HOW FAST CAN THE HORSE RUN.
Again, it’s pointless if the horse has a feather-weight handicap mark, if the horse can’t run to save itself, then it may require a walking race entry for it to win. A lot of horses just cannot run fast – and if you can identify those horses early I have found that you can eliminate a good number of the runners almost immediately and then focus on the true contenders. Makes finding the winner a damn sight easier!
So… the strategy I have been using lately is to sort the runners in terms of speed, fastest to slowest, based on a very good speed rating service I have found (more about this in another article). I then calculate the median value – and for those who don’t know what the median is: essentially it is the middle number in the range. For odd numbers this is easy, the median value in a 5 runner race would be 3, with 2 either side, the top 2 and the bottom 2.
For an even number of runners, say a 8 horse race, you have had to average the 3rd and 4th values to find the median.
Essentially, I eliminate any horse below median – 1 from the equation now immediately! Sometimes it only knocks out a couple of contenders, other times it knocks out more, but the object of the exercise is to find a starting point for selection, and this does the trick well. It leaves me to focus on those horses, that based on speed, have a decent chance of winning.
If you use any decent spreadsheet package you will find the median function is there. For quick calculations I use Libre Calc, part of the Libre Office suite which is open source and free to use. The same functionality is built into Microsoft Excel.
However, if like me you import a range of data into MySQL tables, then the following SQL will prove invaluable and save you a lot of time searching for a solution to find the median value:
SELECT AVG(middle_values) AS 'median' FROM (
SELECT t1.horse_speed AS 'middle_values' FROM
SELECT @row:=@row+1 as `row`, x.horse_speed
FROM speed AS x, (SELECT @row:=0) AS r
ORDER BY x.horse_speed
) AS t1,
SELECT COUNT(*) as 'count'
FROM speed x
) AS t2
WHERE t1.row >= t2.count/2 and t1.row <= ((t2.count/2) +1)) AS t3;
Just change the horse_speed reference to your own column name and you are done and this works.
Using statistics is a powerful way to analyse horse races and sports events, and over the course of the year we will be learning how to apply statistical analysis to our selection process.
Hope this proves helpful.