Group by Date

  • I am currently using SQL Server 2005. I have a table with over 100,000,000 rows in it. One of the columns is datetime and is stored like this - 2008-05-03 10:25:07.300

    I need to group some of the data by date, not datetime. I know that there are many different ways to do this, but I have been using the same solution (outlined below) for some tiem and would like to get opinions on a better and/or faster way to do this.

    --create table

    create table TimeGroup (

    column1 varchar(30),

    InsertDate datetime

    )

    --insert dummy data

    insert into TimeGroup

    values('value1', '2008-05-01 10:25:05.300')

    insert into TimeGroup

    values('value2', '2008-05-01 10:25:06.300')

    insert into TimeGroup

    values('value3', '2008-05-02 10:25:05.300')

    insert into TimeGroup

    values('value4', '2008-05-02 10:25:06.300')

    insert into TimeGroup

    values('value5', '2008-05-03 10:25:05.300')

    insert into TimeGroup

    values('value6', '2008-05-03 10:25:06.300')

    insert into TimeGroup

    values('value6', '2008-05-03 10:25:07.300')

    --query to group by

    select count(*), convert (char(8),InsertDate,1)

    from TimeGroup

    group by convert (char(8),InsertDate,1)

    Thanks for you help!

  • If this works, I'd do that. I have had issues at times with conversions and needed to go to the datepart function to get things to work.

    select datepart( yyyy, datefield), datepart( mm, datefield)...

    from xxx

    group by datepart( yyyy, datefield) ..

  • My favorite way to do this on tables where you query this kind of thing regularly, is to use indexed, computed columns.

    Alter table dbo.MyTable

    add DateYear as datepart(year, MyDateColumn),

    DateMonth as datepart(month, MyDateColumn),

    DateDay as datepart(day, MyDateColumn)

    go

    create index IDX_MyTable_DateValues on dbo.MyTable(DateYear, DateMonth, DateDay)

    include (MyOtherColumn1, MyOtherColumn2, ...)

    That allows you to group by DateYear, DateMonth, DateDay; or by DateYear, DateMonth; or by DateYear. Different reports/forms/whatever can use different combinations.

    If you just want the date, without the time, a computed column can do that too. Or, if you want to go more granular, you can add more datepart columns (hour, minute, second, millisecond). You can use Quarter, or Day of Year, as well. Anything that will work in datepart. Just keep in mind that indexing can get disk-heavy if you use columns you won't need.

    The advantage is you can index them, and if you include the right columns, you'll get very good peformance. Much better than using conversions and functions in your Where and Group By clauses.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thanks. I'm concerned about performance. Do you know if this solution would out-perform the convert function?

  • One suggestion I would have would be to possibly create a computed column on the table that does not have the time, well, has the time set to 00:00:00. If this is a query run regularly. You can check out this article on computed columns[/url] by Andy Warren.

  • GSquared (5/1/2008)


    My favorite way to do this on tables where you query this kind of thing regularly, is to use indexed, computed columns.

    Alter table dbo.MyTable

    add DateYear as datepart(year, MyDateColumn),

    DateMonth as datepart(month, MyDateColumn),

    DateDay as datepart(day, MyDateColumn)

    go

    create index IDX_MyTable_DateValues on dbo.MyTable(DateYear, DateMonth, DateDay)

    include (MyOtherColumn1, MyOtherColumn2, ...)

    That allows you to group by DateYear, DateMonth, DateDay; or by DateYear, DateMonth; or by DateYear. Different reports/forms/whatever can use different combinations.

    If you just want the date, without the time, a computed column can do that too. Or, if you want to go more granular, you can add more datepart columns (hour, minute, second, millisecond). You can use Quarter, or Day of Year, as well. Anything that will work in datepart. Just keep in mind that indexing can get disk-heavy if you use columns you won't need.

    The advantage is you can index them, and if you include the right columns, you'll get very good peformance. Much better than using conversions and functions in your Where and Group By clauses.

    I hate it when I get beat to the punch.:D

  • lfmn (5/1/2008)


    Thanks. I'm concerned about performance. Do you know if this solution would out-perform the convert function?

    Which solution are you asking about? Steve's use of datepart in the Select and Group By clauses, or my use of datepart in calculated columns?

    Datepart in the Select and Group By will have about the same performance as convert.

    Calculated columns, with the right index, can massively out-perform either of the other solutions.

    I created a test table with 1-million random dates in it, then ran 3 selects:

    create table dbo.Dates (

    ID int identity primary key,

    Date datetime)

    select convert(varchar(20), date, 1), count(*)

    from dbo.dates

    group by convert(varchar(20), date, 1)

    select datepart(year, date), datepart(month, date), datepart(day, date), count(*)

    from dbo.dates

    group by datepart(year, date), datepart(month, date), datepart(day, date)

    select dateyear, datemonth, dateday, count(*)

    from dbo.dates

    group by dateyear, datemonth, dateday

    The first one took 1400 ms, the second one 1300 ms.

    For the third one, I added the calculated columns and an index on them. The the select took 300 ms. (Less than 1/4 the time of the other two.)

    Interestingly, after I added the calculated columns and the index, the second query (based on what Steve proposed), used that index, even without referencing the column names. SQL Server was clever enough to realize it was the same data. Once they were in place, execution of the second and third queries took identical time and IO characteristics.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (5/1/2008)


    lfmn (5/1/2008)


    Thanks. I'm concerned about performance. Do you know if this solution would out-perform the convert function?

    Which solution are you asking about? Steve's use of datepart in the Select and Group By clauses, or my use of datepart in calculated columns?

    Thanks. I'm actually testing all of the solutions. I'm not sure how feasible it is to add to this table given it's size, but I'm looking into it.

  • Adding a computed column takes 0 ms and no extra disk space. If the column is precise and deterministic, adding an index to it will take about half an hour for 100,000,000 rows.

    I wouldn't use date format #1, though... maybe 112 (ISO), but not #1.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks to everyone for the good advice!

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply