February 9, 2009 at 1:02 am
I have three tables:
CREATE TABLE [dbo].[Item] (
[ID] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](255) NOT NULL,
[IDParent] [int] NOT NULL
)
CREATE TABLE [dbo].[Statistics](
[ID] [int] IDENTITY(1,1) NOT NULL,
[IDItem] [int] NOT NULL,
[Date] [smalldatetime] NOT NULL,
[stat1] [decimal](19, 2) NULL,
[stat2] [decimal](19, 2) NULL,
[stat3] [decimal](19, 2) NULL,
)
CREATE TABLE [dbo].[KPIs](
[ID] [int] IDENTITY(1,1) NOT NULL,
[IDItem] [int] NOT NULL,
[Date] [smalldatetime] NOT NULL,
[KPI1] [decimal](19, 2) NULL,
[KPI2] [decimal](19, 2) NULL,
[KPI3] [decimal](19, 2) NULL,
)
With a certain Item belong 0 or 1 Statistic and 0 or 1 KPI
What i want is a query that gives me summed results of the statistics & Kpis for each Item with a certain parent (grouped by Item).
This would look something like:
SELECT Item.Name
, SUM(Statistics.stat1)
, SUM(Statistics.stat2)
, SUM(Statistics.stat3)
, SUM(KPIs.kpi1)
, SUM(KPIs.kpi2)
, SUM(KPIs.kpi3)
FROM Items
LEFT JOIN Statistics ON statistics.IDItem = Items.ID AND statistics.Date BETWEEN '2009-01-01' AND '2009-01-31'
LEFT JOIN KPIs ON KPIs.IDItem = Items.ID AND KPIs.Date BETWEEN '2009-01-01' AND '2009-01-31'
WHERE Items.IDParent = 123
GROUP BY Item.Name
The problem with this query is that the Statistic and KPI are not "joined", so 1 kpi record can occur 31 times if there are also 31 days of statistics for the same item.
My next try was creating a full outer join index:
ALTER VIEW [dbo].[V_StatsKpis]
AS
ISNULL(statistics.IDItem, kpis.IDItem) IDItem,
, ISNULL(statistics.Date, kpis.Date) Date
, Statistics.stat1
, Statistics.stat2
, Statistics.stat3
, KPIs.kpi1
, KPIs.kpi2
, KPIs.kpi3
FROM dbo.Statistics
FULL OUTER JOIN KPIs ON Statistics.Date = KPIs.Date AND Statistics.IDItem = KPIs.IDItem
And than the query would look like :
SELECT Item.Name
, SUM(v.stat1)
, SUM(v.stat2)
, SUM(v.stat3)
, SUM(v.kpi1)
, SUM(v.kpi2)
, SUM(v.kpi3)
FROM Items
LEFT JOIN V_StatsKpis v ON v.IDItem = Items.ID AND v.Date BETWEEN '2009-01-01' AND '2009-01-31'
WHERE Items.IDParent = 123
GROUP BY Item.Name
But...... this one is EXTREMELY slow. And you are not allowed to put an index on a view with full outer joins.
So: How should i solve this problem, and the performance is EXTREMELY important. It is even so important that modifying the tablestructure is possible. But putting the statistics and kpi data in 1 table isn't possible because this data is imported from different sources.
Some extra info:
-in real world there are not just 3 statistics / kpis but 10 statistic columns and 15 kpi columns.
-The Item table contains 60.000.000 rows
-The Statistic table contains 200.000.000 rows and grows with 1.500.000 rows a day
-The KPI table contains 50.000.000 rows and grows with 100.000 rows a day
February 9, 2009 at 2:39 am
I think i have found the best solution:
SELECT Item.Name
, stat.Stat1
, stat.Stat2
, stat.Stat3
, kpi.kpi1
, kpi.kpi2
, kpi.kpi3
FROM Items
LEFT JOIN (
SELECT
IDItem
, SUM(Statistics.stat1) stat1
, SUM(Statistics.stat2) stat2
, SUM(Statistics.stat3) stat3
FROM Statistics
WHERE statistics.Date BETWEEN '2009-01-01' AND '2009-01-31'
GROUP BY IDItem
) stat ON stat.IDItem = Item.ID
LEFT JOIN (
SELECT
IDItem
, SUM(KPIs.kpi1) kpi1
, SUM(KPIs.kpi2) kpi2
, SUM(KPIs.kpi3) kpi3
FROM KPIs
WHERE KPIs.Date BETWEEN '2009-01-01' AND '2009-01-31'
GROUP BY IDItem
) kpi ON kpi.IDItem = Item.ID
WHERE Items.IDParent = 123
February 9, 2009 at 6:52 am
The solution provided here is somewhat similar to what Sander has provided, and I am not sure on whether this is going to be faster, but may be you can do a test & let us know, how these solutions performed.
SELECTI.Name,
SUM( CASE WHEN SK.RecordType = 'S' THEN SK.StatKPI1 ELSE 0 END ) AS Stat1,
SUM( CASE WHEN SK.RecordType = 'S' THEN SK.StatKPI2 ELSE 0 END ) AS Stat2,
SUM( CASE WHEN SK.RecordType = 'S' THEN SK.StatKPI3 ELSE 0 END ) AS Stat3,
SUM( CASE WHEN SK.RecordType = 'K' THEN SK.StatKPI1 ELSE 0 END ) AS KPI1,
SUM( CASE WHEN SK.RecordType = 'K' THEN SK.StatKPI2 ELSE 0 END ) AS KPI2,
SUM( CASE WHEN SK.RecordType = 'K' THEN SK.StatKPI3 ELSE 0 END ) AS KPI3
FROMItem I
LEFT JOIN
(
SELECT'S' AS RecordType, IDItem,
SUM( stat1 ) StatKPI1,
SUM( stat2 ) StatKPI2,
SUM( stat3 ) StatKPI3
FROM[Statistics]
WHERE[Date] >= '2009-01-01'
AND [Date] <= '2009-01-31'
GROUP BY IDItem
UNION all
SELECT'K' AS RecordType, IDItem,
SUM( kpi1 ) StatKPI1,
SUM( kpi2 ) StatKPI2,
SUM( kpi3 ) StatKPI3
FROMKPIs
WHERE[Date] >= '2009-01-01'
AND [Date] <= '2009-01-31'
GROUP BY IDItem
) SK ON I.ID = SK.IDItem
WHEREI.IDParent = 123
GROUP BY I.NAME
--Ramesh
February 9, 2009 at 7:00 am
Ramesh (2/9/2009)
The solution provided here is somewhat similar to what Sander has provided, and I am not sure on whether this is going to be faster, but may be you can do a test & let us know, how these solutions performed.
SELECTI.Name,
SUM( CASE WHEN SK.RecordType = 'S' THEN SK.StatKPI1 ELSE 0 END ) AS Stat1,
SUM( CASE WHEN SK.RecordType = 'S' THEN SK.StatKPI2 ELSE 0 END ) AS Stat2,
SUM( CASE WHEN SK.RecordType = 'S' THEN SK.StatKPI3 ELSE 0 END ) AS Stat3,
SUM( CASE WHEN SK.RecordType = 'K' THEN SK.StatKPI1 ELSE 0 END ) AS KPI1,
SUM( CASE WHEN SK.RecordType = 'K' THEN SK.StatKPI2 ELSE 0 END ) AS KPI2,
SUM( CASE WHEN SK.RecordType = 'K' THEN SK.StatKPI3 ELSE 0 END ) AS KPI3
FROMItem I
LEFT JOIN
(
SELECT'S' AS RecordType, IDItem,
SUM( stat1 ) StatKPI1,
SUM( stat2 ) StatKPI2,
SUM( stat3 ) StatKPI3
FROM[Statistics]
WHERE[Date] >= '2009-01-01'
AND [Date] <= '2009-01-31'
GROUP BY IDItem
UNION all
SELECT'K' AS RecordType, IDItem,
SUM( kpi1 ) StatKPI1,
SUM( kpi2 ) StatKPI2,
SUM( kpi3 ) StatKPI3
FROMKPIs
WHERE[Date] >= '2009-01-01'
AND [Date] <= '2009-01-31'
GROUP BY IDItem
) SK ON I.ID = SK.IDItem
WHEREI.IDParent = 123
GROUP BY I.NAME
Interesting solution. But i don't see this working in my situation. In real world the columns of the statistics table and the kpi table are very different. So union them to 1 table won't work, or i have to use lots of NULL columns.
I don't think the performance will be very different than the solution i came up with. Since the affected rows will be the same in both situations, the only difference is that i don't have to do 6 cases for every row (and there can be up to 1.000.000 rows)
But i thank you very much for thinking of a solution!!!
February 9, 2009 at 7:18 am
Is the Date column in the Stats table relevant to the sum you want to do? And does it contain time values, or just dates?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
February 9, 2009 at 7:21 am
The date is relevant because you have to select the sum of the statistics / kpis in a certain timerange.
Date is of type smalldatetime and contains just the datepart
February 9, 2009 at 7:39 am
Assuming (from the forum you posted in) that you're using SQL 2005, this should get you what you want:
;with
[Stats] (ItemID, Stat1, Stat2, Stat3) as
(select ItemID, sum(Stat1), sum(Stat2), sum(Stat3)
from dbo.Statistics
where ItemID in
(select ItemID
from dbo.Items
where IDParent = 123)
and Date between '2009-01-01' and '2009-01-31'
group by ItemID),
Ks (ItemID, KPI1, KPI2, KPI3) as
(select ItemID, sum(KPI1), sum(KPI2), sum(KPI3)
from dbo.KPIs
where ItemID in
(select ItemID
from dbo.Items
where IDParent = 123)
and Date between '2009-01-01' and '2009-01-31'
group by ItemID)
select ItemName, Stat1, Stat2, Stat3, KPI1, KPI2, KPI3
from dbo.Items
left outer join [Stats]
on Items.ItemID = [Stats].ItemID
left outer join Ks
on Items.ItemsID = Ks.ItemID
where IDParent = 123;
If it's actually SQL 2000, you'll need to modify the CTEs into derived tables in the From clause.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
February 9, 2009 at 7:40 am
Sander A. (2/9/2009)
Interesting solution. But i don't see this working in my situation. In real world the columns of the statistics table and the kpi table are very different. So union them to 1 table won't work, or i have to use lots of NULL columns.
Ramesh has selected a key and aggregate columns which match in the two UNIONed queries. It doesn't matter how different the two (or more) source tables are - provided that you find some common ground between them, you can UNION them.
Sander A. (2/9/2009)
I don't think the performance will be very different than the solution i came up with. Since the affected rows will be the same in both situations, the only difference is that i don't have to do 6 cases for every row (and there can be up to 1.000.000 rows)
He isn't - he's performing six CASEs on a filtered, preaggregated subset - exactly where it should be for top performance π
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
February 9, 2009 at 7:47 am
Sander A. (2/9/2009)
Interesting solution. But i don't see this working in my situation. In real world the columns of the statistics table and the kpi table are very different. So union them to 1 table won't work, or i have to use lots of NULL columns.
...I agree with the real world situation, but I am also sure (somewhat) that the no. of. statistics/KPIs are limited (say 50-70 max, that I am thinking..), which can be converted to UNION ALL query. And also you are working with large amount data, So its worth doing a test.
I don't think the performance will be very different than the solution i came up with. Since the affected rows will be the same in both situations, the only difference is that i don't have to do 6 cases for every row (and there can be up to 1.000.000 rows)
I'd better like to see a performance test:)..., And I know if it fares well, you don't mind typing too many case statements...:D
BTW, I was thinking you & Sander are two different people:hehe:, don't know why??
Edit:
I was doing some other stuff while replying to it, so I didn't see the earlier replies....
Thanks Chris for the explanation...
--Ramesh
February 9, 2009 at 7:51 am
Ramesh (2/9/2009)I'd better like to see a performance test:)..., And I know if it fares well, you don't mind typing too many case statements...:D
BTW, I was thinking you & Sander are two different people:hehe:, don't know why??
I'm doing a performance test right now. But first i have to find a good subset of data and rewrite the query 3 times to test all solutions that where given in this topic. When i'm done (probably tomorrow, since my workingday is done in a half hour π ) i will post the results here.
Me & Sander being the same person : well, it was indeed a bit confusing because i posted the solution to my own question. So it isn't that strange π
February 9, 2009 at 8:36 am
Oke, the results of the performance test. I tested this on 40.000 items with a difference of 30 days between the start and enddate (so that will be around 1.000.000 statistics and 100.000 kpis)
During the test i found a possible improvement of my own query, in stead of:
LEFT JOIN (
SELECT
IDItem
, SUM(Statistics.stat1) stat1
FROM Statistics
WHERE statistics.Date BETWEEN '2009-01-01' AND '2009-01-31'
GROUP BY IDItem
) stat ON stat.IDItem = Item.ID
it's faster to use:
LEFT JOIN (
SELECT
IDItem
, SUM(Statistics.stat1) stat1
FROM Statistics
WHERE statistics.Date BETWEEN '2009-01-01' AND '2009-01-31'
AND ItemID in (select ItemID from dbo.Items where IDParent = 123)
GROUP BY IDItem
) stat ON stat.IDItem = Item.ID
I ran the query several times and used the average execution time:
My own solution : 2330 ms
My improved solution : 2120 ms
Ramesh solution : 2090 ms
GSquared solution : 1890 ms
So GSquare is the winner !! Now i "only" have to refactor my complete java-Querybuilder .....:w00t:
thnx for your solutions
February 9, 2009 at 8:50 am
Can you re-do the test with the same update applying on my version of the query & post back the results? 'Cause I've a strong feelin' that it should work far better than you have mentioned.
And I know, Gus, you can prove me wrong:), if you wanted to....
--Ramesh
February 9, 2009 at 9:11 am
Ramesh, it's easy enough to test performance on a simplified version of this.
create table #Stats (
ID int identity primary key,
ItemID int,
Date smalldatetime,
Stat1 float,
Stat2 float,
Stat3 float);
create table #KPIs (
ID int identity primary key,
ItemID int,
Date smalldatetime,
KPI1 float,
KPI2 float,
KPI3 float);
create table #Items (
ID int identity primary key,
Name char(10));
insert into #Items (Name)
select number
from dbo.Numbers
where number between 1 and 10;
;with Dates (Date) as
(select dateadd(day, number-1, '1/1/09')
from dbo.numbers
where number between 0 and 31)
insert into #Stats (ItemID, Date, Stat1, Stat2, Stat3)
select number, date, checksum(newid()), checksum(newid()), checksum(newid())
from dbo.numbers
cross join dates
where number between 1 and 10;
;with Dates (Date) as
(select dateadd(day, number-1, '1/1/09')
from dbo.numbers
where number between 0 and 31)
insert into #KPIs (ItemID, Date, KPI1, KPI2, KPI3)
select number, date, checksum(newid()), checksum(newid()), checksum(newid())
from dbo.numbers
cross join dates
where number between 1 and 10;
set nocount on;
set statistics io on;
set statistics time on;
SELECT I.Name,
SUM( CASE WHEN SK.RecordType = 'S' THEN SK.StatKPI1 ELSE 0 END ) AS Stat1,
SUM( CASE WHEN SK.RecordType = 'S' THEN SK.StatKPI2 ELSE 0 END ) AS Stat2,
SUM( CASE WHEN SK.RecordType = 'S' THEN SK.StatKPI3 ELSE 0 END ) AS Stat3,
SUM( CASE WHEN SK.RecordType = 'K' THEN SK.StatKPI1 ELSE 0 END ) AS KPI1,
SUM( CASE WHEN SK.RecordType = 'K' THEN SK.StatKPI2 ELSE 0 END ) AS KPI2,
SUM( CASE WHEN SK.RecordType = 'K' THEN SK.StatKPI3 ELSE 0 END ) AS KPI3
FROM #Items I
LEFT JOIN
(
SELECT 'S' AS RecordType, ItemID,
SUM( stat1 ) StatKPI1,
SUM( stat2 ) StatKPI2,
SUM( stat3 ) StatKPI3
FROM #Stats [Statistics]
GROUP BY ItemID
UNION all
SELECT 'K' AS RecordType, ItemID,
SUM( kpi1 ) StatKPI1,
SUM( kpi2 ) StatKPI2,
SUM( kpi3 ) StatKPI3
FROM #KPIs KPIs
GROUP BY ItemID
) SK ON I.ID = SK.ItemID
GROUP BY I.NAME;
;with
[Stats] (ItemID, Stat1, Stat2, Stat3) as
(select ItemID, sum(Stat1), sum(Stat2), sum(Stat3)
from #Stats
group by ItemID),
Ks (ItemID, KPI1, KPI2, KPI3) as
(select ItemID, sum(KPI1), sum(KPI2), sum(KPI3)
from #KPIs
group by ItemID)
select Name, Stat1, Stat2, Stat3, KPI1, KPI2, KPI3
from #Items
left outer join [Stats]
on #Items.ID = [Stats].ItemID
left outer join Ks
on #Items.ID = Ks.ItemID;
If you run the above and look at the actual execution plans for both of the final query versions, you'll find that the CTE version has less steps and less work.
On my computer, the first query took 152 milliseconds total, 140 milliseconds of CPU time, while the second took 13 milliseconds total and 16 milliseconds of CPU time (better parallelism). They had identical IO stats.
Of course, with more rows, the possibilities of various indexes being involved, etc., the execution plans will change, but the complexity of them will still have about the same ratio.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
February 18, 2009 at 7:45 am
The total calculation works now, but now i'm comming to the history screens and again i'm running into a problem (i'm feeling quite dumb that i can't figure this out but after trying several thins i'm runnig out of well performing options):
First of all, the tables again
CREATE TABLE [dbo].[Item] (
[ID] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](255) NOT NULL,
[IDParent] [int] NOT NULL
)
CREATE TABLE [dbo].[Statistics](
[ID] [int] IDENTITY(1,1) NOT NULL,
[IDItem] [int] NOT NULL,
[Date] [smalldatetime] NOT NULL,
[stat1] [decimal](19, 2) NULL,
[stat2] [decimal](19, 2) NULL,
[stat3] [decimal](19, 2) NULL,
)
CREATE TABLE [dbo].[KPIs](
[ID] [int] IDENTITY(1,1) NOT NULL,
[IDItem] [int] NOT NULL,
[Date] [smalldatetime] NOT NULL,
[KPI1] [decimal](19, 2) NULL,
[KPI2] [decimal](19, 2) NULL,
[KPI3] [decimal](19, 2) NULL,
)
Now i don't want a total query, but i want a history for a certain item. The query i wrote looks something like this:
SELECT i.Name
, s.stat1
, s.stat2
, s.stat3
, k.kpi1
, k.kpi2
, k.kpi3
FROM Items i
LEFT JOIN Statistics s ON s.IDItem = i.ID
LEFT JOIN KPIs k ON k.IDItem = i.ID
WHERE i.ID = 100
AND s.Date = k.Date
The problem in this case is that there hasn't to be a statistic or kpi record for every day. But because of the s.Date = k.Date only the days which have both statistics and kpis are shown in the result.
How can i fix this problem??
February 18, 2009 at 8:12 am
SELECTi.Name, s.stat1, s.stat2, s.stat3, k.kpi1, k.kpi2, k.kpi3
FROMItems i
INNER JOIN
(
SELECTISNULL( s.IDItem, k.IDItem ) AS IDItem,
s.stat1, s.stat2, s.stat3, k.kpi1, k.kpi2, k.kpi3
FROM[STATISTICS] s
FULL OUTER JOIN KPIs k ON s.IDItem = k.IDItem AND s.Date = k.Date
WHEREs.IDItem = 100 OR k.IDItem = 100
) ks ON i.ID = ks.IDItem
WHERE i.ID = 100
--Ramesh
Viewing 15 posts - 1 through 15 (of 21 total)
You must be logged in to reply to this topic. Login to reply