January 21, 2019 at 9:27 am
Hi all,
I have a stored procedure that returns a table 1Million + rows that I then need to pivot (I can't modify the stored procedure as it makes a call to a proprietary .exe). I first tried using the pivot operator to do what I need but that takes about 7 minutes to generate a 1440 row table with 900+ columns. I managed to reduced the execution time to a little over 2 minutes by using conditional aggregation with dynamically generated column names. I'm still looking for a way to reduce the execution even further to be acceptable for the user but am stumped as to any further optimizations I could do.
INSERT INTO Exampledb.dbo.DestinationTable_EF278
SELECT
Timestamp,
[NC_S\Column 1] = SUM(CASE WHEN TagName='NC_S\Column 1' THEN value END),
[NC_S\Column 2] = SUM(CASE WHEN TagName='NC_S\Column 2' THEN value END),
[NC_S\Column 3] = SUM(CASE WHEN TagName='NC_S\Column 3' THEN value END),
[...]Repeat 900 times[...]
[NC_S\Column 901] = SUM(CASE WHEN TagName='NC_S\Column 901' THEN value END),
FROM Exampledb.dbo.SourceTable_EF278
GROUP BY Timestamp
ORDER BY Timestamp
Any ideas how to make this faster ?
Thanks !
prefet
January 21, 2019 at 10:29 am
prefet - Monday, January 21, 2019 9:27 AMHi all,I have a stored procedure that returns a table 1Million + rows that I then need to pivot (I can't modify the stored procedure as it makes a call to a proprietary .exe). I first tried using the pivot operator to do what I need but that takes about 7 minutes to generate a 1440 row table with 900+ columns. I managed to reduced the execution time to a little over 2 minutes by using conditional aggregation with dynamically generated column names. I'm still looking for a way to reduce the execution even further to be acceptable for the user but am stumped as to any further optimizations I could do.
INSERT INTO Exampledb.dbo.DestinationTable_EF278
SELECT
Timestamp,
[NC_S\Column 1] = SUM(CASE WHEN TagName='NC_S\Column 1' THEN value END),
[NC_S\Column 2] = SUM(CASE WHEN TagName='NC_S\Column 2' THEN value END),
[NC_S\Column 3] = SUM(CASE WHEN TagName='NC_S\Column 3' THEN value END),
[...]Repeat 900 times[...]
[NC_S\Column 901] = SUM(CASE WHEN TagName='NC_S\Column 901' THEN value END),
FROM Exampledb.dbo.SourceTable_EF278
GROUP BY Timestamp
ORDER BY TimestampAny ideas how to make this faster ?
Thanks !
prefet
Don't return 900+ columns? I mean seriously how useful can a query be with that many columns? Maybe find a completely different approach to whatever you are trying to do.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
January 21, 2019 at 11:52 am
That statement begs for more explanation.
How is this proc calling a .exe, and what does the .exe do?
Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/
January 21, 2019 at 12:47 pm
@micheal-2, the .exe is part of the historian meant to handle large amounts of time series data. The sproc is here down the line for ingestion in our other tools hence the 900 columns.
January 21, 2019 at 12:54 pm
Do you have 1 row per Timestamp, TagName? If so I think your query is probably about as fast as it will go. If you have many rows per Timestamp, TagName then you would get better performance if you aggregate first:
;WITH CTE AS
(
SELECT Timestamp,
TagName,
SUM(value) value
FROM Exampledb.dbo.SourceTable_EF278
GROUP BY Timestamp, TagName
)
SELECT Timestamp,
[NC_S\Column 1] = SUM(CASE WHEN TagName='NC_S\Column 1' THEN value END),
[NC_S\Column 2] = SUM(CASE WHEN TagName='NC_S\Column 2' THEN value END),
[NC_S\Column 3] = SUM(CASE WHEN TagName='NC_S\Column 3' THEN value END),
[...]Repeat 900 times[...]
[NC_S\Column 901] = SUM(CASE WHEN TagName='NC_S\Column 901' THEN value END)
FROM CTE
GROUP BY Timestamp
ORDER BY Timestamp
January 22, 2019 at 8:27 am
Thanks. I tried with the additional CTE but it didn't reduce execution time.
January 22, 2019 at 10:42 am
You need to either:
Cluster the table by timestamp (if that's how you (almost) always query against the table)
Or
Create a non-clus index on (Timestamp) include (TagName, Value)
Either one of those will avoid having to sort all the data every time the query runs.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
January 24, 2019 at 6:12 am
A query without any kind of filtering such as this is frequently only helped by upgrading hardware. Yes, possibly indexing differently could help... maybe (can't see the execution plan, so I have no clue). However, that's only going to help a little. Then it's just down to hardware. When you're moving everything, you need the hardware to support that.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
January 24, 2019 at 6:36 am
ScottPletcher - Tuesday, January 22, 2019 10:42 AMYou need to either:Cluster the table by timestamp (if that's how you (almost) always query against the table)
Or
Create a non-clus index on (Timestamp) include (TagName, Value)Either one of those will avoid having to sort all the data every time the query runs.
A streaming aggregate, which is often an order of magnitude faster than a hash aggregate, won't be facilitated by either of these suggestions, but by a clustered index on TimeStamp, TagName or an ordinary index on (Timestamp,TagName) include (Value).
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
January 24, 2019 at 7:50 am
We're on 2008, so this won't help, however, a possible Hail Mary on this would be to look at columnstore indexes.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
January 24, 2019 at 7:56 am
Grant Fritchey - Thursday, January 24, 2019 7:50 AMWe're on 2008, so this won't help, however, a possible Hail Mary on this would be to look at columnstore indexes.
No streaming aggregates in 2008?
I think columnstore indexes were introduced in 2012 with limited functionality - but this is the version we're using and cs indexes have proven themselves to be very useful for some of our forecasting scripts.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
January 24, 2019 at 7:59 am
I was wondering how much time was taken going through a 900 line case statement 1 million times.
It would certainly contribute to the runtime of the query.
January 24, 2019 at 8:12 am
ChrisM@Work - Thursday, January 24, 2019 6:36 AMScottPletcher - Tuesday, January 22, 2019 10:42 AMYou need to either:Cluster the table by timestamp (if that's how you (almost) always query against the table)
Or
Create a non-clus index on (Timestamp) include (TagName, Value)Either one of those will avoid having to sort all the data every time the query runs.
A streaming aggregate, which is often an order of magnitude faster than a hash aggregate, won't be facilitated by either of these suggestions, but by a clustered index on TimeStamp, TagName or an ordinary index on (Timestamp,TagName) include (Value).
But the overall effect on performance might be worse, possibly much worse, with TagName in the keys, depending on the order the rows are inserted. I don't know that a hash match aggregate will be that slow here, and it will certainly be much faster than having to sort all the rows, which is the real performance killer.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
January 24, 2019 at 8:22 am
ScottPletcher - Thursday, January 24, 2019 8:12 AMChrisM@Work - Thursday, January 24, 2019 6:36 AMScottPletcher - Tuesday, January 22, 2019 10:42 AMYou need to either:Cluster the table by timestamp (if that's how you (almost) always query against the table)
Or
Create a non-clus index on (Timestamp) include (TagName, Value)Either one of those will avoid having to sort all the data every time the query runs.
A streaming aggregate, which is often an order of magnitude faster than a hash aggregate, won't be facilitated by either of these suggestions, but by a clustered index on TimeStamp, TagName or an ordinary index on (Timestamp,TagName) include (Value).
But the overall effect on performance might be worse, possibly much worse, with TagName in the keys, depending on the order the rows are inserted. I don't know that a hash match aggregate will be that slow here, and it will certainly be much faster than having to sort all the rows, which is the real performance killer.
Timestamp and TagName are *required* as the leading keys in any index to support a streaming aggregate in this case (a third column as the first column in the index _might_ also work provided that the value is the same for all qualifying rows). If either column is missing or both columns are present but there's a third column between them, then a sort operator would almost certainly appear in the plan - if the optimizer chose a streaming aggregate over a hash aggregate. And yes, that sort would be slow.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
January 24, 2019 at 10:10 am
ChrisM@Work - Thursday, January 24, 2019 8:22 AMScottPletcher - Thursday, January 24, 2019 8:12 AMChrisM@Work - Thursday, January 24, 2019 6:36 AMScottPletcher - Tuesday, January 22, 2019 10:42 AMYou need to either:Cluster the table by timestamp (if that's how you (almost) always query against the table)
Or
Create a non-clus index on (Timestamp) include (TagName, Value)Either one of those will avoid having to sort all the data every time the query runs.
A streaming aggregate, which is often an order of magnitude faster than a hash aggregate, won't be facilitated by either of these suggestions, but by a clustered index on TimeStamp, TagName or an ordinary index on (Timestamp,TagName) include (Value).
But the overall effect on performance might be worse, possibly much worse, with TagName in the keys, depending on the order the rows are inserted. I don't know that a hash match aggregate will be that slow here, and it will certainly be much faster than having to sort all the rows, which is the real performance killer.
Timestamp and TagName are *required* as the leading keys in any index to support a streaming aggregate in this case (a third column as the first column in the index _might_ also work provided that the value is the same for all qualifying rows). If either column is missing or both columns are present but there's a third column between them, then a sort operator would almost certainly appear in the plan - if the optimizer chose a streaming aggregate over a hash aggregate. And yes, that sort would be slow.
I'd be amazed if the SQL optimizer were sophisticated enough to analyze all the CASE statements and determine they relied only on TagName in addition to Timestamp and to be able to then do a more efficient aggregation. I'm not saying SQL can't do it, but I'd like to see it to actually do it before I was sure it would be capable of that.
Also, for each new value of Timestamp, SQL would clear the hash buckets, right? As long as all the buckets for a given timestamp fit into memory, a hash agg isn't going to have perf issues, is it?
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 15 posts - 1 through 15 (of 18 total)
You must be logged in to reply to this topic. Login to reply