SQL gourping by multiple columns help

  • Hello,

    I have the following table:

    ID bigint,

    Name varchar(50),

    Address varchar(250),

    RecDate smalldatetime,

    PurPrice decimal(7,2),

    PaidBy varchar(50)

    I am trying to come up with a select statement that will group the results by ID and sums the PurPrice but also at the same time group the returned result by month extracted from RecDate.

    Is this possible in one statement?

    example of output would be something like this:

    ID Name SumPrice Month

    1 Test 1500 01

    1 Test 1000 02

    etc...

    Thanks in advance

  • tonytohme (11/25/2016)


    Hello,

    I have the following table:

    ID bigint,

    Name varchar(50),

    Address varchar(250),

    RecDate smalldatetime,

    PurPrice decimal(7,2),

    PaidBy varchar(50)

    I am trying to come up with a select statement that will group the results by ID and sums the PurPrice but also at the same time group the returned result by month extracted from RecDate.

    Is this possible in one statement?

    example of output would be something like this:

    ID Name SumPrice Month

    1 Test 1500 01

    1 Test 1000 02

    etc...

    Thanks in advance

    Here's a guess:SELECT

    ID

    ,Name

    ,SUM(CurPrice) AS SumPrice

    ,YEAR(RecDate) AS RecYear

    ,MONTH(RecDate) AS RecMonth

    FROM MyTable

    GROUP BY

    ID

    ,Name

    ,YEAR(RecDate

    ,MONTH(RecDate)

    John

  • Below query will give you the desired results:

    declare @table table (ID bigint,

    Name varchar(50),

    Address varchar(250),

    RecDate smalldatetime,

    PurPrice decimal(7,2),

    PaidBy varchar(50)

    )

    insert into @table

    values (1, 'Peter', 'Street 1', '2016-11-22', 100, 'PayPal')

    , (2, 'John', 'Lane 1', '2016-10-12', 225, 'PayPal')

    , (2, 'John', 'Lane 1', '2016-11-12', 75, 'CreditCard')

    , (1, 'Peter', 'Street 1', '2016-09-08', 80, 'PayPal')

    , (3, 'Stephen', 'Park 1', '2016-09-05', 90, 'CreditCard')

    , (3, 'Stephen', 'Park 1', '2016-11-15', 120, 'CreditCard')

    , (1, 'Peter', 'Street 1', '2016-09-17', 15, 'CreditCard')

    , (3, 'Stephen', 'Park 1', '2016-11-20', 60, 'PayPal')

    select ID

    , Name

    , MONTH(RecDate) as month

    , SUM(PurPrice) as Total_spend

    from @table

    group by ID

    , Name

    , MONTH(RecDate)

    order by ID

    , month

    Btw: your sample table is not normalized. You should seperate the person details from the purchace records. The below structure is a better approach because you only have to store the person/address once.

    declare @purchace table (ID bigint

    , RecDate smalldatetime

    , PurPrice decimal(7,2)

    , PaidBy varchar(50)

    )

    declare @name table (ID bigint

    , Name varchar(50)

    , Address varchar(250)

    )

    insert into @purchace

    values (1, '2016-11-22', 100, 'PayPal')

    , (2, '2016-10-12', 225, 'PayPal')

    , (2, '2016-11-12', 75, 'CreditCard')

    , (1, '2016-09-08', 80, 'PayPal')

    , (3, '2016-09-05', 90, 'CreditCard')

    , (3, '2016-11-15', 120, 'CreditCard')

    , (1, '2016-09-17', 15, 'CreditCard')

    , (3, '2016-11-20', 60, 'PayPal')

    insert into @name

    values (1, 'Peter', 'Street 1')

    , (2, 'John', 'Lane 1')

    , (3, 'Stephen', 'Park 1')

    select name.ID

    , Name

    , MONTH(RecDate) as month

    , SUM(PurPrice) as Total_spend

    from @purchace purchace

    inner join @name name

    on purchace.id = name.id

    group by name.ID

    , Name

    , MONTH(RecDate)

    order by ID

    , month

    ** Don't mistake the ‘stupidity of the crowd’ for the ‘wisdom of the group’! **
  • Thank you so much guys for the quick reponce you are awesome.

    The tables I mentioned in my post are not actuals its just an example to get an understading of how I can use group by. Actual tables are normalized/

    Thanks again

  • tonytohme (11/25/2016)


    Hello,

    I have the following table:

    ID bigint,

    Name varchar(50),

    Address varchar(250),

    RecDate smalldatetime,

    PurPrice decimal(7,2),

    PaidBy varchar(50)

    I am trying to come up with a select statement that will group the results by ID and sums the PurPrice but also at the same time group the returned result by month extracted from RecDate.

    Is this possible in one statement?

    example of output would be something like this:

    ID Name SumPrice Month

    1 Test 1500 01

    1 Test 1000 02

    etc...

    Thanks in advance

    First, I knew a "Tony Tohme" way back when. Did you ever work for a company called "ACN"?

    Second, there is a bit of a danger in one of the posted solutions if there are multiple years of data involved because any given month number is NOT unique across years.

    So the question is, are you going to run into multi-year data with this query and, if so, what would you like to see in the output to differentiate the total of, say, January of 2016 from January of 2015? The solution posted by John Mitchell takes care of that problem but there's also the opportunity to generate Sub-Totals and maybe even "pivot" the data for reporting purposes by 1 row per ID per year with a column for each month, year totals, Sub-Totals for each ID for all months, and a Grand Total for all months/years.

    Also, to help us help you in the future and to make things a bit more clear, please see the article at the first link under "Helpful Links" in my signature line below.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff

    Firstly no I never even heard of ACN.

    Second you are right I might be trying to get data from across multiple years in that case I would condition the select tot return only data where year would be equal to a parameter.

    Unless you have a better solution :-).

    Thanks

  • tonytohme (11/25/2016)


    Hi Jeff

    Firstly no I never even heard of ACN.

    Second you are right I might be trying to get data from across multiple years in that case I would condition the select tot return only data where year would be equal to a parameter.

    Unless you have a better solution :-).

    Thanks

    Ok... then let's open up some reporting "possibilities". 🙂

    First, let's simulate a more realistic bit of data. 1 Million transactions across 10,000 customers across 7 years should be fun. This takes about 12 seconds to generate if your test database, 4 seconds if your test database is TempDB.

    --===== If the test table exists, drop it to make reruns in SSMS easier.

    IF OBJECT_ID('dbo.AggTestTable','U') IS NOT NULL

    DROP TABLE dbo.AggTestTable

    ;

    --===== Create the test table.

    CREATE TABLE dbo.AggTestTable

    (

    ID BIGINT NOT NULL

    ,Name VARCHAR(50) NOT NULL

    ,Address VARCHAR(250) NOT NULL

    ,RecDate SMALLDATETIME NOT NULL

    ,PurPrice DECIMAL(7,2) NOT NULL

    ,PaidBy VARCHAR(50) NOT NULL

    )

    ;

    --===== Add the Clustered Index to support performance and to keep fragmentation at bay

    CREATE CLUSTERED INDEX IXC_TestTable ON dbo.AggTestTable (RecDate,ID)

    ;

    GO

    --===== Populate the test table with constrained random data.

    -- ID - 1 THRU 10,000 (10,000) "customers"

    -- Name - 'SomeName' + the ID left padded with zeros to 6 places.

    -- Address - 'SomeAddress' + the ID left padded with zeros to 6 places.

    -- RecDate - 2010-01-01 00:00 THRU 2016-12-31 23:59

    -- PurPrice - 1.00 THRU 500.00

    -- PaidBy - PayPal or CreditCard

    WITH cteGenBaseData AS

    (

    SELECT TOP 1000000

    ID = ABS(CHECKSUM(NEWID())%10000)+1

    ,RecDate = RAND(CHECKSUM(NEWID()))*DATEDIFF(DD,'2010','2017')+CONVERT(SMALLDATETIME,'2010')

    ,PurPrice = CONVERT(DECIMAL(7,2),RAND(CHECKSUM(NEWID()))*499+1)

    ,PaidBy = CASE ABS(CHECKSUM(NEWID())%2) WHEN 0 THEN 'PayPal' ELSE 'CreditCard' END

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    )

    INSERT INTO dbo.AggTestTable WITH(TABLOCK) -- Part of requirement for "Minimal Logging"

    (ID,Name,Address,RecDate,PurPrice,PaidBy)

    SELECT ID = ID

    ,Name = 'SomeName' + RIGHT('000000'+CONVERT(VARCHAR(10),ID),6)

    ,Address = 'SomeAddress' + RIGHT('000000'+CONVERT(VARCHAR(10),ID),6)

    ,RecDate = RecDate

    ,PurPrice = PurPrice

    ,PaidBy = PaidBy

    FROM cteGenBaseData

    ORDER BY RecDate,ID-- Part of requirement for "Minimal Logging" with Clustered Index in Place

    OPTION (RECOMPILE)

    ;

    --===== Let's see what the first 10,000 rows look like

    SELECT TOP 10000 * FROM dbo.AggTestTable

    ;

    Ok... So we have 7 years of random transactions with 10,000 customers. Let's see what most all of the possibilities are. We need rows for the following.

    For all the rows, we'll "pivot" the data to month columns for each year using pre-aggregation and an "ancient" but fast and easy to understand method known as a "CROSS TAB". The end of each row will also have a "Row Total".

    With that in mind, we need rows for each of the following things and each sub-total and the grand total will all have the correct totals for every month AND the RowTotal. That means that the RowTotal of the Grand Total row will be the total of all PurPrice's.

    1. Detail rows - basically, these show the "raw" data.

    2. Sub-Total User/Year

    3. Sub-Total User

    4. Sub-Total Year/PaidBy

    5. Sub-Total Year

    6. Sub-Total PaidBy

    7. Grand Total

    This whole shebang takes 13-15 seconds to run. If you add a WHERE clause for just one or a couple of IDs, it's nearly instantaneous. If you add a WHERE clause for, say, the years 2015 and 2006 (>=2015 and < 2017 so the index can be used), it takes about 3 seconds.

    I left the GID (GROUPING_ID) in the output just for this demonstration. It can be removed from the output. I also calculated and displayed a "LineType" column. Again, just for demonstration. It can be removed or changed to suppress (for example, the word "Detail").

    Here's the code. I would NOT use it as a view but a clever person could use it as an iTVF (Inline Table Valued Function)

    WITH ctePreAgg AS

    ( --=== This pre-aggregates the data and DRYs out some forumulas.

    SELECT ID --I'm assuming this is some sort of Customer ID.

    ,Name = MAX(Name) --Assumes there's only one Name per ID

    ,Address = MAX(ADDRESS) --Assumes there's only one Address per ID (could be very wrong here)

    ,RecYear = DATENAME(yy,RecDate)

    ,RecMonth = DATEPART(mm,RecDate)

    ,PurPrice = SUM(PurPrice)

    ,PaidBy

    FROM dbo.AggTestTable

    --WHERE ID IN (1,2000)

    --WHERE RecDate >= '2015' AND RecDate < '2017'

    GROUP BY ID,DATENAME(yy,RecDate),DATEPART(mm,RecDate),PaidBy

    )

    SELECT GID = GROUPING_ID(ID,RecYear,PaidBy)

    -- 4 2 1 --Each column grouping has a binary number assigned and are added together to for a GID.

    --A GID of 6 (for example) means that ID (4) and RecYear(2)

    --were **NOT** included in the grouping meaning it **WAS** grouped by PaidBy(1).

    --Conversely, a GID of 1 means that PaidBy (1) was **NOT** included

    --in the grouping meaning it **WAS** grouped by ID(4) and RecYear(2)

    ,LineType = CASE GROUPING_ID(ID,RecYear,PaidBy)

    WHEN 0 THEN 'Detail'

    WHEN 1 THEN 'Sub-Total User/Year'

    WHEN 3 THEN 'Sub-Total User'

    WHEN 4 THEN 'Sub-Total Year/PaidBy'

    WHEN 5 THEN 'Sub-Total Year'

    WHEN 6 THEN 'Sub-Total PaidBy'

    WHEN 7 THEN 'Grand_Total'

    END

    ,ID = ISNULL(CONVERT(VARCHAR(32),ID),'----------')

    ,Name = CASE WHEN GROUPING_ID(ID,RecYear,PaidBy) IN (0,1,3) THEN MAX(Name) ELSE '--------------------' END

    ,Address = CASE WHEN GROUPING_ID(ID,RecYear,PaidBy) IN (0,1,3) THEN MAX(ADDRESS) ELSE '--------------------' END

    ,[Year] = ISNULL(RecYear,'----')

    ,PaidBy = ISNULL(PaidBy ,'----------')

    ,Jan = SUM(CASE WHEN RecMonth = 1 THEN PurPrice ELSE 0 END)

    ,Feb = SUM(CASE WHEN RecMonth = 2 THEN PurPrice ELSE 0 END)

    ,Mar = SUM(CASE WHEN RecMonth = 3 THEN PurPrice ELSE 0 END)

    ,Apr = SUM(CASE WHEN RecMonth = 4 THEN PurPrice ELSE 0 END)

    ,May = SUM(CASE WHEN RecMonth = 5 THEN PurPrice ELSE 0 END)

    ,Jun = SUM(CASE WHEN RecMonth = 6 THEN PurPrice ELSE 0 END)

    ,Jul = SUM(CASE WHEN RecMonth = 7 THEN PurPrice ELSE 0 END)

    ,Aug = SUM(CASE WHEN RecMonth = 8 THEN PurPrice ELSE 0 END)

    ,Sep = SUM(CASE WHEN RecMonth = 9 THEN PurPrice ELSE 0 END)

    ,Oct = SUM(CASE WHEN RecMonth = 10 THEN PurPrice ELSE 0 END)

    ,Nov = SUM(CASE WHEN RecMonth = 11 THEN PurPrice ELSE 0 END)

    ,Dec = SUM(CASE WHEN RecMonth = 12 THEN PurPrice ELSE 0 END)

    ,RowTotal = SUM(PurPrice)

    FROM ctePreAgg

    GROUP BY GROUPING SETS

    ( --===== We're using GROUP SETS instead of a CUBE to shave off about 2/3rds the duration of the run.

    (ID,RecYear,PaidBy) --Detail

    ,(ID,RecYear) --Sub-Total User/Year

    ,(ID) --Sub-Total User

    ,(RecYear,PaidBy) --Sub-Total Year/PaidBy

    ,(RecYear) --Sub-Total Year

    ,(PaidBy) --Sub-Total PaidBy

    ,() --Grand Total

    )

    ORDER BY GROUPING(ID) ,ID --In order to preserve the order of where the subtotals appear,

    ,GROUPING(RecYear),RecYear --you have to sort by the GROUPING of each column and the column

    ,GROUPING(PaidBy) ,PaidBy --that appear in the grouping or GROUPING SETS.

    ;

    The imaginative person could use similar to do things like add on RowTotals by quarter and what percent a quarter was of each row and other "crazy" things that CFOs and CEOs like to know.

    The really cool part is that you don't need SSRS for this and, if you persist the result set as a table and get a little imaginative with a spreadsheet that reads from the table, the C-Level folks can do all sorts of pretty things without having to bug you for another report. 😉

    For more information on the "Black Arts" technique of "CROSSTABS" and "Pre-Aggregation" (thank you Peter "PESO" Larsson for the term), please see the following introductory articles on the subject. The first article also includes some performance tests against the relatively aweful PIVOT operator.

    [font="Arial Black"]Cross Tabs and Pivots, Part 1 – Converting Rows to Columns[/font][/url]

    [font="Arial Black"]Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs[/font][/url]

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply