Need a query to select aggregate data by buckets

Question

Need a query to select aggregate data by buckets

SQL Guy 1

SSCoach

Points: 15979
More actions
March 24, 2014 at 3:22 pm

#286909

I have a table which has a unique integer column. It's not contiguous, it has some gaps.
I need to find a count of records within amaginary buckets, for example [1-10],[11-20],[21-30].....
Thanks

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 1

SQL Guy 1 (3/24/2014)
I have a table which has a unique integer column. It's not contiguous, it has some gaps.
I need to find a count of records within amaginary buckets, for example [1-10],[11-20],[21-30].....
Thanks

Would help if you provided more details on what you are trying to accomplish. Just based on the info provided I know there isn't much I can do to help.

SQL Guy 1 SSCoach Points: 15979 More actions · Answer 2

That was rather simplified example. In reality I have a table with 150 mln records, and it is partitioned by integer column (it is some ID and clustered primary key) with buckets of 10 mln each. It also has some modify date. I need to find out min and max of that modify date by buckets of that 10 mln ID's. For example I need to find modifydate where ID in (1, 1000000). Next for ID in (1000001, 2000000), and so on. However, some ID's are skipped. so there is no record with ID=2000000. In such a case I need to find the closest one, like 1998412.

I can write a query with multiple unions, but want to find a neater solution.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 3

SQL Guy 1 (3/25/2014)
That was rather simplified example. In reality I have a table with 150 mln records, and it is partitioned by integer column (it is some ID and clustered primary key) with buckets of 10 mln each. It also has some modify date. I need to find out min and max of that modify date by buckets of that 10 mln ID's. For example I need to find modifydate where ID in (1, 1000000). Next for ID in (1000001, 2000000), and so on. However, some ID's are skipped. so there is no record with ID=2000000. In such a case I need to find the closest one, like 1998412.
I can write a query with multiple unions, but want to find a neater solution.

If "it is partitioned by integer column", can't you simply aggregate by that column and grab the min and max date?

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 4

SQL Guy 1 (3/25/2014)
That was rather simplified example. In reality I have a table with 150 mln records, and it is partitioned by integer column (it is some ID and clustered primary key) with buckets of 10 mln each. It also has some modify date. I need to find out min and max of that modify date by buckets of that 10 mln ID's. For example I need to find modifydate where ID in (1, 1000000). Next for ID in (1000001, 2000000), and so on. However, some ID's are skipped. so there is no record with ID=2000000. In such a case I need to find the closest one, like 1998412.
I can write a query with multiple unions, but want to find a neater solution.

Please take the time to read the following article and follow its instructions regarding what you should post and how to do it.

http://www.sqlservercentral.com/articles/Best+Practices/61537/

Based on what you have posted so far, I really have no idea where to start. I can't see what you see. You need to help us help you.

ben.brugman SSChampion Points: 13369 More actions · Answer 5

SQL Guy 1 (3/24/2014)
I have a table which has a unique integer column. It's not contiguous, it has some gaps.
I need to find a count of records within amaginary buckets, for example [1-10],[11-20],[21-30].....
Thanks

I have two examples, first is where the 'bucket' size is constant.

-- Count with a bucket size of a 1000

-- CCCCCC is an integer column in the table TTTTTT

-- DDDDDD can be any field for example it can be a datetime field.

--

select COUNT(*) as Hello, MIN(DDDDDD), MAX(DDDDDD)

CCCCCC/1000 -- Line 1

from TTTTTT

GROUP BY

CCCCCC/1000 -- Line 2

ORDER BY

CCCCCC/1000 -- Line 3

-- The three marked lines each contain the same code.

-- If one is replaced the others should be replaced with the same construction.

--

Example 2,

Here I use a logaritmic scale, which is used on two times, the time difference is of importance. The buckets are logaritmic in scale for the time difference.

--

-- Difference in seconds between two events.

-- Shown in bucket of 'ever' increasing sizes.

--

select top 30 count(*),

round(4*log10((convert(float, nextevent)-convert(float, event))*24*60*60+.01),0)

from TTTTTT

group by

round(4*log10((convert(float, nextevent)-convert(float, event))*24*60*60+.01),0)

order by

round(4*log10((convert(float, nextevent)-convert(float, event))*24*60*60+.01),0)

The last example has a prevention against zero's and that is adding a .01.

Now a logaritmic scale of difference is shown and the number of events in within this.

Both examples show a line repeating three times. You can construct all sorts of 'grouping' replacing these three lines.

It could be substring(CCCCCC, 17, 3)

It could be replace(CCCCCC, 'x', 'y')

etc.

Please report if this is an anwser in the right direction.

For the logaritm scale you 'probably' need another 'field' which marks for example the max of each bucket. (Or the min or both). This is left to your own fantasy.

Ben