From ideation to implementation

This article is a snapshot for my future self. I could write it a thousand times and each time would be different. Hopefully I will look back at it in a while and see how I have grown. Hopefully you…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Calculating A Bucketed Median With SQL

Calculating the median has never seemed to be straight forward in SQL. Some SQL engines don’t even have a direct median function. Typically, you run a function on data that is raw so at least you know how the aggregated data looks.
However, let’s throw in a curveball. What if all the data that was already aggregated and you weren’t able to access the raw data. Instead, you only had a table that was high level.
For example, let’s say the data looked like the data below.

It might seem strange to have data stored this way, but there are cases where condensing the data in this fashion might make sense. Let’s say you’re team is storing billions of transactions per day and you want to run a median calculation over several years.

In this case, it makes more sense to keep the data aggregated because it will improve the performance. You might not even be able to properly store the transactions if it isn’t aggregated.

It could be tempting to “unroll” the data so to speak. To take the data and create a new table that has a row for each of the instances of a person at a specific age. This could cause your rows to grow significantly, to terabytes upon terabytes. Also is just another step that might not need to be done.

Instead, let’s look this query.

The way this works is by using the rolling sum you can figure out where the half-way point will end up because the running total will sum of the total value.

Then using the other analytic function that gets the sum you can divide it by half and only select the top value. The median will be the top value.

We would love to know your thoughts on this problem. Also, if you have any problems you would like to discuss then please feel free to contact us! Thanks for reading.

Add a comment

Related posts:

1. Citing your college website with a known author

Citing a college website is required in many cases, when students want to refer their University works or information in their academic papers, during admissions or while presenting a seminar or…

I Will Show My Love By Poking Your Lips In an MRI Machine.

In a new study that reminds me of why l love science, researchers have invented a new way to show your partner love: an MRI built for two. That’s because brain scientists have become increasingly…

How To Manage Remote Sales Teams and Motivate SDRs

The idea of having staff working in different locations around the world was once the domain of the large multinationals. Now with high-speed internet accessible by more people every day, more and…