SqL query help

  • Helo friends, hope everyone is doing good. I am working on a project where I have a big fact table having data for 8 years and has around 8-10 dimensions. The requirement is that I have to get some data for a monthly basis (between 01st and last day of month) plus the YTD. The data is like this:

    -- example
    create table dbo.myfacttable
    (businessdate datetime, businessid varchar(20), business_attr1 varchar(10), business_attr2 varchar(10), business_attr3 varchar(10), amount numeric(28,8), createddate datetme, updateddate datetime)

    -- SAMPLE DATA: Actual Data will be from year 2015 to 2022
    insert into dbo.myfacttable
    select '2022-08-01', '1234', 'a1001', 'b1001', 'c1001', 25.2800, NULL, NULL UNION ALL
    select '2022-08-01', '4455', 'a1001', 'b1002', 'c1005', 108.7800, NULL, NULL UNION ALL
    select '2022-08-02', '1234', 'a1005', 'b1003', 'c1020', 58.8350, NULL, NULL UNION ALL
    select '2022-08-05', '1234', 'a1002', 'b1002', 'c1009', 62.0950, NULL, NULL UNION ALL
    select '2022-08-03', '1234', 'a1008', 'b1012', 'c1006', 10.1680, NULL, NULL UNION ALL
    select '2022-08-02', '4455', 'a1004', 'b1008', 'c1005', 9.5000, NULL, NULL UNION ALL
    select '2022-08-05', '4455', 'a1006', 'b1001', 'c1003', 460.1900, NULL, NULL UNION ALL
    select '2022-08-01', '6767', 'a1002', 'b1009', 'c1010', 203.6350, NULL, NULL

    -- I need a query like this and need solution to this "ytd"
    select a.businessid, a.businessdate, a.business_attr1, a.business_attr2,
    sum(case when business_attr3 = 'c1001' then a.amount else 0 end) as 'xyz_amount',
    sum(case when business_attr3 in('c1002', 'c1005') then a.amount else 0 end) as 'lmn_amount',
    sum(case when business_attr3 not in('c1001', 'c1002', 'c1005') then a.amount else 0 end) as 'rest_amount',
    ytd = sum of amount from 2022-01-01 till date for businessid+business_attr1+business_attr2
    from dbo.myfacttable a
    inner join dbo.dim_attr1 a1 on a.business_attr1 = a1.id and a1.is_active = 1
    where businessdate between '2022-08-01' and '2022-08-31'
    group by a.businessid, a.businessdate, a.business_attr1, a.business_attr2
    order by 2, 1
  • You have datetime columns - don't use between!

    where businessdate between '2022-08-01' and '2022-08-31'

    means return everything through '2022-08-31 00:00:00.000 (excludes that entire last day of the month except for anything precisely at midnight).

    You want

    where businessdate >= '2022-08-01' and businessdate < '2022-09-01'

    Not an answer to your question, but very important.

  • ratbak wrote:

    You have datetime columns - don't use between!

    where businessdate between '2022-08-01' and '2022-08-31'

    means return everything through '2022-08-31 00:00:00.000 (excludes that entire last day of the month except for anything precisely at midnight). You want

    where businessdate >= '2022-08-01' and businessdate < '2022-09-01'

    Not an answer to your question, but very important.

    I got your point but that column is only storing date and not time. Time is present in timestamp columns at the very end which i am not populating in the example.

  • Split the datetime column into two columns, one for the date, and another one for the time. Then your calendar table will work. And you can create a time dimension with only 1440 records. =) and divide that up however you want with attribute columns.

    The requirement is that I have to get some data for a monthly basis (between 01st and last day of month) plus the YTD.

    If you're doing your analysis in PowerBI, use DAX.

    CALCULATE ( SUM([amount]),

    FILTER('DimDate', 'DimDate'[Year] = MAX('DimDate'[Year]) && 'DimDate'[Date] <= MAX('DimDate'[Date]) )

    maybe the best place to start with DAX is Ferrari & Russo's DAXPatterns.com

    • This reply was modified 2 years, 4 months ago by  pietlinden.
  • Before calculating the ytd amounts you could first summarize to the month.  Then CROSS APPLY to get ytd amounts

    drop table if exists #myfacttable;
    go
    create table #myfacttable(
    businessdate datetime,
    businessid varchar(20),
    business_attr1 varchar(10),
    business_attr2 varchar(10),
    business_attr3 varchar(10),
    amount numeric(28,8),
    createddate datetime,
    updateddate datetime);

    -- SAMPLE DATA: Actual Data will be from year 2015 to 2022
    insert into #myfacttable
    select '2022-08-01', '1234', 'a1001', 'b1001', 'c1001', 25.2800, NULL, NULL UNION ALL
    select '2022-08-01', '4455', 'a1001', 'b1002', 'c1005', 108.7800, NULL, NULL UNION ALL
    select '2022-08-02', '1234', 'a1005', 'b1003', 'c1020', 58.8350, NULL, NULL UNION ALL
    select '2022-08-05', '1234', 'a1002', 'b1002', 'c1009', 62.0950, NULL, NULL UNION ALL
    select '2022-08-03', '1234', 'a1008', 'b1012', 'c1006', 10.1680, NULL, NULL UNION ALL
    select '2022-08-02', '4455', 'a1004', 'b1008', 'c1005', 9.5000, NULL, NULL UNION ALL
    select '2022-08-05', '4455', 'a1006', 'b1001', 'c1003', 460.1900, NULL, NULL UNION ALL
    select '2022-01-05', '4455', 'a1006', 'b1001', 'c1003', 210.2100, NULL, NULL UNION ALL /* add this row */
    select '2022-08-01', '6767', 'a1002', 'b1009', 'c1010', 203.6350, NULL, NULL;

    with month_cte as (
    select a.businessid, a.business_attr1, a.business_attr2,
    sum(case when business_attr3 = 'c1001' then a.amount else 0 end) as xyz_amount,
    sum(case when business_attr3 in('c1002', 'c1005') then a.amount else 0 end) as lmn_amount,
    sum(case when business_attr3 not in('c1001', 'c1002', 'c1005') then a.amount else 0 end) as rest_amount
    from #myfacttable a
    --join dbo.dim_attr1 a1 on a.business_attr1=a1.id
    -- and a1.is_active = 1
    where businessdate >= '2022-08-01'
    and businessdate < '2022-09-01'
    group by a.businessid, a.business_attr1, a.business_attr2)
    select m.*, v.sum_ytd sum_ytd
    from month_cte m
    cross apply (select sum(aa.amount)
    from #myfacttable aa
    where aa.businessid=m.businessid
    and aa.business_attr1=m.business_attr1
    and aa.business_attr2=m.business_attr2
    and aa.businessdate >= '2022-01-01'
    and aa.businessdate < '2022-09-01') v(sum_ytd)
    order by m.businessid;

     

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • If there's a lot of data involved and you're likely to be running regular analyses on it, I'd consider setting this up in a Tabular Model and using DAX to get all of the date-based aggregations done for you.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • sqlenthu 89358 wrote:

    ratbak wrote:

    You have datetime columns - don't use between!

    where businessdate between '2022-08-01' and '2022-08-31'

    means return everything through '2022-08-31 00:00:00.000 (excludes that entire last day of the month except for anything precisely at midnight). You want

    where businessdate >= '2022-08-01' and businessdate < '2022-09-01'

    Not an answer to your question, but very important.

    I got your point but that column is only storing date and not time. Time is present in timestamp columns at the very end which i am not populating in the example.

    For now, that is.  I hope you never have to find out what happens when they change that on you in the future or you "forget" and use BETWEEN on something where it shouldn't have been used.  It only takes an extra couple of seconds to make your code bullet-prove against such possibilities.

    Just a little advice from an old dude that's seen it happen way too many times.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • sqlenthu 89358 wrote:

    Helo friends, hope everyone is doing good. I am working on a project where I have a big fact table having data for 8 years and has around 8-10 dimensions. The requirement is that I have to get some data for a monthly basis (between 01st and last day of month) plus the YTD. The data is like this:

    -- example
    create table dbo.myfacttable
    (businessdate datetime, businessid varchar(20), business_attr1 varchar(10), business_attr2 varchar(10), business_attr3 varchar(10), amount numeric(28,8), createddate datetme, updateddate datetime)

    -- SAMPLE DATA: Actual Data will be from year 2015 to 2022
    insert into dbo.myfacttable
    select '2022-08-01', '1234', 'a1001', 'b1001', 'c1001', 25.2800, NULL, NULL UNION ALL
    select '2022-08-01', '4455', 'a1001', 'b1002', 'c1005', 108.7800, NULL, NULL UNION ALL
    select '2022-08-02', '1234', 'a1005', 'b1003', 'c1020', 58.8350, NULL, NULL UNION ALL
    select '2022-08-05', '1234', 'a1002', 'b1002', 'c1009', 62.0950, NULL, NULL UNION ALL
    select '2022-08-03', '1234', 'a1008', 'b1012', 'c1006', 10.1680, NULL, NULL UNION ALL
    select '2022-08-02', '4455', 'a1004', 'b1008', 'c1005', 9.5000, NULL, NULL UNION ALL
    select '2022-08-05', '4455', 'a1006', 'b1001', 'c1003', 460.1900, NULL, NULL UNION ALL
    select '2022-08-01', '6767', 'a1002', 'b1009', 'c1010', 203.6350, NULL, NULL

    -- I need a query like this and need solution to this "ytd"
    select a.businessid, a.businessdate, a.business_attr1, a.business_attr2,
    sum(case when business_attr3 = 'c1001' then a.amount else 0 end) as 'xyz_amount',
    sum(case when business_attr3 in('c1002', 'c1005') then a.amount else 0 end) as 'lmn_amount',
    sum(case when business_attr3 not in('c1001', 'c1002', 'c1005') then a.amount else 0 end) as 'rest_amount',
    ytd = sum of amount from 2022-01-01 till date for businessid+business_attr1+business_attr2
    from dbo.myfacttable a
    inner join dbo.dim_attr1 a1 on a.business_attr1 = a1.id and a1.is_active = 1
    where businessdate between '2022-08-01' and '2022-08-31'
    group by a.businessid, a.businessdate, a.business_attr1, a.business_attr2
    order by 2, 1

    So, according to your GROUP BY, you want one row for each businessid/businessdate combo.  Is the YTD thing supposed to be based on that "business date?  It would have help answer that if you had posted some expected results that match the test data you provided.

    Also, it would be a real help to the people trying to help you if you broke up your long code either before or after a comma both in real life and on posts.

    Last but not least, you didn't even test your own test data code to see if it works... and it doesn't.  If it did, it wouldn't have any test data for two of the attributes.  We've been through this many times.  If you don't care, why should we?

    Unsubscribing from this thread.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • What about the dimensions not in the fact table?

    SELECT          a1.businessid,
    a.businessdate,
    a.business_attr1,
    a.business_attr2,
    SUM(CASE WHEN a.businessdate >= '2022-08-01' AND a1.business_attr3 = 'c1001' THEN a.amount ELSE 0 END) AS xyz_amount,
    SUM(CASE WHEN a.businessdate >= '2022-08-01' AND a1.business_attr3 IN ('c1002', 'c1005') THEN a.amount ELSE 0 END) AS lmn_amount,
    SUM(CASE WHEN a.businessdate >= '2022-08-01' AND a1.business_attr3 NOT IN ('c1001', 'c1002', 'c1005') THEN a.amount ELSE 0 END) AS rest_amount,
    SUM(CASE WHEN a.businessdate <= SYSDATETIME() THEN a.amount ELSE 0 END) AS ytd
    FROM dbo.dim_attr1 AS a1
    LEFT JOIN dbo.myfacttable AS a ON a.business_attr1 = a1.id
    AND a.businessdate >= '2022-01-01'
    AND a.businessdate < '2022-09-01'
    WHERE a1.is_active = 1
    GROUP BY a1.businessid,
    a.businessdate,
    a.business_attr1,
    a.business_attr2
    ORDER BY a.businessdate,
    a1.businessid;

    N 56°04'39.16"
    E 12°55'05.25"

  • Jeff Moden wrote:

    sqlenthu 89358 wrote:

    ratbak wrote:

    You have datetime columns - don't use between!

    where businessdate between '2022-08-01' and '2022-08-31'

    means return everything through '2022-08-31 00:00:00.000 (excludes that entire last day of the month except for anything precisely at midnight). You want

    where businessdate >= '2022-08-01' and businessdate < '2022-09-01'

    Not an answer to your question, but very important.

    I got your point but that column is only storing date and not time. Time is present in timestamp columns at the very end which i am not populating in the example.

    For now, that is.  I hope you never have to find out what happens when they change that on you in the future or you "forget" and use BETWEEN on something where it shouldn't have been used.  It only takes an extra couple of seconds to make your code bullet-prove against such possibilities.

    Just a little advice from an old dude that's seen it happen way too many times.

    Thanks for emphasising on the point and i am totally following you. However i know that setup is based on entire project's architecture and thats how the business is running currently. That's why i am not taking it as a risk. I do make sure about the threat you mentioned while dealing with other date time columns.

  • Jeff Moden wrote:

    sqlenthu 89358 wrote:

    Helo friends, hope everyone is doing good. I am working on a project where I have a big fact table having data for 8 years and has around 8-10 dimensions. The requirement is that I have to get some data for a monthly basis (between 01st and last day of month) plus the YTD. The data is like this:

    -- example
    create table dbo.myfacttable
    (businessdate datetime, businessid varchar(20), business_attr1 varchar(10), business_attr2 varchar(10), business_attr3 varchar(10), amount numeric(28,8), createddate datetme, updateddate datetime)

    -- SAMPLE DATA: Actual Data will be from year 2015 to 2022
    insert into dbo.myfacttable
    select '2022-08-01', '1234', 'a1001', 'b1001', 'c1001', 25.2800, NULL, NULL UNION ALL
    select '2022-08-01', '4455', 'a1001', 'b1002', 'c1005', 108.7800, NULL, NULL UNION ALL
    select '2022-08-02', '1234', 'a1005', 'b1003', 'c1020', 58.8350, NULL, NULL UNION ALL
    select '2022-08-05', '1234', 'a1002', 'b1002', 'c1009', 62.0950, NULL, NULL UNION ALL
    select '2022-08-03', '1234', 'a1008', 'b1012', 'c1006', 10.1680, NULL, NULL UNION ALL
    select '2022-08-02', '4455', 'a1004', 'b1008', 'c1005', 9.5000, NULL, NULL UNION ALL
    select '2022-08-05', '4455', 'a1006', 'b1001', 'c1003', 460.1900, NULL, NULL UNION ALL
    select '2022-08-01', '6767', 'a1002', 'b1009', 'c1010', 203.6350, NULL, NULL

    -- I need a query like this and need solution to this "ytd"
    select a.businessid, a.businessdate, a.business_attr1, a.business_attr2,
    sum(case when business_attr3 = 'c1001' then a.amount else 0 end) as 'xyz_amount',
    sum(case when business_attr3 in('c1002', 'c1005') then a.amount else 0 end) as 'lmn_amount',
    sum(case when business_attr3 not in('c1001', 'c1002', 'c1005') then a.amount else 0 end) as 'rest_amount',
    ytd = sum of amount from 2022-01-01 till date for businessid+business_attr1+business_attr2
    from dbo.myfacttable a
    inner join dbo.dim_attr1 a1 on a.business_attr1 = a1.id and a1.is_active = 1
    where businessdate between '2022-08-01' and '2022-08-31'
    group by a.businessid, a.businessdate, a.business_attr1, a.business_attr2
    order by 2, 1

    So, according to your GROUP BY, you want one row for each businessid/businessdate combo.  Is the YTD thing supposed to be based on that "business date?  It would have help answer that if you had posted some expected results that match the test data you provided.

    Also, it would be a real help to the people trying to help you if you broke up your long code either before or after a comma both in real life and on posts.

    Last but not least, you didn't even test your own test data code to see if it works... and it doesn't.  If it did, it wouldn't have any test data for two of the attributes.  We've been through this many times.  If you don't care, why should we?

    Unsubscribing from this thread.

    Jeff, I am extremely sorry about the code not working. Actually there's a reason behind it. I can't post using my company's laptop and apart from that I have Mac and i have yet to find a way to install SSMS there (as per Microsoft site, there's no installation setup available for MacOS). That's the reason i couldn't test the code but tried to give the scenario with best of my capabilities.

    The output should be something like this:

    -- Suppose amount YTD till July 31st was
    -- For 1234 => 50
    -- For 4455 => 120
    -- For 6767 => 80
    businessid, businessdate, business_attr1, business_attr2, xyz_amount, lmn_amount, rest_amount, YTD
    1234, 2022-08-01, 'a1001', 'b1001', 10.0000, 5.0000, 12.0000, 77.0000
    4455, 2022-08-01, 'a1001', 'b1002', 8.0000, 10.0000, 25.0000, 163.0000
    6767, 2022-08-01, 'a1002', 'b1009', 2.0000, 6.0000, 1.0000, 89.0000
    1234, 2022-08-02, 'a1001', 'b1001', 2.0000, 10.0000, 10.0000, 99.0000
    ...
    -- The output will be for each business id for all business days in the current month plus
    -- the YTD calculated against each date.

     

  • Hello friends,

    Made the new query. Here is the dim & fact tables:

    Create table #dim1(attr1_id int, attr1_name varchar(100), createdate datetime default getdate())
    Create table #dim2(attr2_id int, attr2_name varchar(100), createdate datetime default getdate())
    Create table #dim3(attr3_id int, attr3_name varchar(100), createdate datetime default getdate())

    Create table #fact(businessdate datetime, businessid varchar(10), attr1_id int, attr2_id int, attr3_id int, amount numeric(28,8), createdate datetime default getdate())

    Insert into #dim1(attr1_id, attr_name)
    Select 1001, 'attr 1:type 1a' union all
    Select 1002, 'attr 1:type 1b' union all
    Select 1003, 'attr 1:type 1c' union all
    Select 1004, 'attr 1:type 1d'

    Insert into #dim2(attr2_id, attr_name)
    Select 2001, 'attr 2:type 2a' union all
    Select 2002, 'attr 2:type 2b' union all
    Select 2003, 'attr 2:type 2c' union all
    Select 2004, 'attr 2:type 2d'

    Insert into #dim3(attr3_id, attr_name)
    Select 3001, 'attr 3:type 3a' union all
    Select 3002, 'attr 3:type 3b' union all
    Select 3003, 'attr 3:type 3c' union all
    Select 3004, 'attr 3:type 3d'

    Insert into #fact(businessdate, businessid, attr1_id, attr2_id, attr3_id, amount)
    Select '2022-01-03', 'bus1001', 1001, 2004, 3002, 2.3500 union all
    Select '2022-01-03', 'bus1001', 1001, 2002, 3004, 7.8430 union all
    Select '2022-01-04', 'bus1002', 1001, 2002, 3004, 12.9475 union all
    Select '2022-01-06', 'bus1002', 1002, 2001, 3002, 5.6983 union all
    Select '2022-01-06', 'bus1002', 1001, 2002, 3001, 46.8936 union all
    Select '2022-01-06', 'bus1001', 1001, 2002, 3004, 16.2849 union all
    Select '2022-01-07', 'bus1001', 1003, 2003, 3001, 4.2846 union all
    Select '2022-01-07', 'bus1002', 1001, 2002, 3004, 8.4548 union all
    Select '2022-02-01', 'bus1001', 1001, 2004, 3002, 6.1147 union all
    Select '2022-02-01', 'bus1002', 1002, 2001, 3002, 9.1427 union all
    Select '2022-02-03', 'bus1001', 1001, 2002, 3004, 1.0562 union all
    Select '2022-02-03', 'bus1002', 1001, 2002, 3004, 5.9855 union all
    Select '2022-05-02', 'bus1001', 1001, 2004, 3002, 8.2186 union all
    Select '2022-05-02', 'bus1002', 1002, 2001, 3002, 11.7639 union all
    Select '2022-05-03', 'bus1001', 1001, 2002, 3004, 10.7094 union all
    Select '2022-05-03', 'bus1002', 1001, 2002, 3004, 9.6386 union all
    Select '2022-05-03', 'bus1002', 1001, 2002, 3001, 6.2387

    Here is the data which I am looking for:

    Businessdate, Businessid, attr1_name, attr2_name, amt_attr_3_type_3a, amt_attr_3_type_3b, amt_attr_3_type_3c, amt_attr_3_type_3d, amt_YTD_attr_3
    '2022-05-02', 'bus1001', 'attr 1:type 1a', 'attr 2:type 2d', 0.0000, 8.2186, 0.0000, 0.0000, 16.6833
    '2022-05-02', 'bus1002', 'attr 1:type 1b', 'attr 2:type 2a', 0.0000, 11.7639, 0.0000, 0.0000, 26.6049
    '2022-05-03', 'bus1001', 'attr 1:type 1a', 'attr 2:type 2b', 0.0000, 0.0000, 0.0000, 10.7094, 35.8935
    '2022-05-03', 'bus1002', 'attr 1:type 1a', 'attr 2:type 2b', 6.2387, 0.0000, 0.0000, 9.6386, 90.1587

    The last column is YTD amount of complete attr3 grouped on business Id, attr1, attr2 and per business day in May month. Please let me know what should be a good approach. I am scratching my head over this. (I am fine if it's not covered in single query, i.e. batches are also fine with me)

  • Is this for reporting? If you're using {SSAS Tabular, Excel, or PowerBI} then DAX is baked right in. "Use the force, Luke!"

    The first book I read on DAX was Rob Collie's book. Worth every penny of the $30 price tag. The reason I don't like doing this in T-SQL is that if your users are using it for reporting, it's easier in the long run to build a model, create some measures, and then let them use it.

  • pietlinden wrote:

    Is this for reporting? If you're using {SSAS Tabular, Excel, or PowerBI} then DAX is baked right in. "Use the force, Luke!"

    The first book I read on DAX was Rob Collie's book. Worth every penny of the $30 price tag. The reason I don't like doing this in T-SQL is that if your users are using it for reporting, it's easier in the long run to build a model, create some measures, and then let them use it.

    It's for reporting plus it has to be loaded every month end for that particular month during off hours, so even if it takes 5-10 mins to calculate it won't be an issue. However whatever I am trying is taking more than 30 mins plus doesn't look right. Thus looking for the help.

  • Let me rephrase my question... (Ask the wrong question, get a useless answer! LOL)

    If the users are going do their own analyses with the data, then I would go the SSAS/PowerBI route. If you have static reports where they need to answer the same questions with different filters, use a regular SSRS report. Either way, you could create views to use as your dimension and fact table sources, and update them periodically (using SSIS to run the refresh if you wanted).

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply