November 1, 2016 at 6:12 am
Hi guys
Wondering if someone can help me with performance tuning a query I've written.
My query
1. pulls a dataset from a log table into a temp table
2. applies a row number for each record set to determine the order that the data arrived
3. uses a while loop to iterate through each field that is different from the next row (this includes a nested while loop as there could be more than one difference within each row)
4. using dynamic sql, gets the value of each field that was different from the previous and stores these values in another temp table
I'm just wondering if I'm missing a much simpler way to run this using windowing functions as opposed to the dynamic sql. This code was working fine for small datasets but it's currently crawling when it's hitting anything over 30k records.
The only other option I can think of is to change the format of the log table to store the data as Old Value and New Value for each of the fields we want to check on (5 fields altogether).
November 1, 2016 at 6:24 am
Are you familiar with the LAG and LEAD functions? I think you could use them to avoid the loops.
John
November 1, 2016 at 6:26 am
First I've heard of them john, any good articles you'd recommend or would a google suffice?
November 1, 2016 at 6:46 am
The examples in Books Online are a good place to start. Here's an illustration (not tested) that I cobbled together. You might also want to bear in mind that storing the old values will double your disk requirements and probably affect performance as well. You can get the old value from the previous row in the table. Write a view if you need to display old and new values side by side.
WITH OldandNew AS (
SELECT
RowNo AS RowNumber
,LAG(RowNo,1) OVER (ORDER BY RowNo) AS PreviousRowNumber
,Col1 AS Newcol1
,LAG(col1,1) OVER (ORDER BY RowNo) AS OldCol1
,Col2 AS Newcol2
,LAG(col2,1) OVER (ORDER BY RowNo) AS OldCol2
FROM MyTable
)
SELECT
RowNumber
,PreviousRowNumber
,Newcol1
,OldCol1
,Newcol2
,OldCol2
FROM OldandNew
WHERE NewCol1 <> OldCol1
OR NewCol2 <> OldCol2
John
November 1, 2016 at 8:57 am
1) Get a copy of Itzik Ben-Gan's Windowing Function book.
2) Fabiano Amorim did a great blog series on Simpletalk.com on WFs
3) You shouldn't need to put the data into a temp table. Huge waste of effort. Just do your work on the actual table, which has to have either an identity or a datetime added (or both) that you can use to process the records in proper order.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
November 1, 2016 at 10:24 am
Thanks guys for the replies
John - I got it working using the method you outlined, I used LEAD instead of LAG
Massive improvement on the performance too
Now I just need to check the data....oh the joys
SQLGuru - Great minds think alike, you were the second person to suggest Itzik Ben-Gan's stuff on this, I hadn't done my research but he looks like the authority on all such things
November 1, 2016 at 10:45 am
mitzyturbo (11/1/2016)
Thanks guys for the repliesJohn - I got it working using the method you outlined, I used LEAD instead of LAG
Massive improvement on the performance too
Now I just need to check the data....oh the joys
SQLGuru - Great minds think alike, you were the second person to suggest Itzik Ben-Gan's stuff on this, I hadn't done my research but he looks like the authority on all such things
This part of John's suggested code does not handle NULLs:
WHERE NewCol1 <> OldCol1
OR NewCol2 <> OldCol2
If your data may contain NULLs, I suggest that you consider using this construct instead, which does:
WHERE EXISTS
(
select NewCol1, NewCol2
except
select OldCol1, OldCol2
)
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
November 1, 2016 at 10:51 am
Cluster the log table on the relevant arrival datetime that you are comparing by (rather than, presumably, a meaningless identity value). Then query directly from the table, joining to itself using the clustering key to do a seek to find the row to join to.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
November 1, 2016 at 11:03 am
Thanks Phil
I've tweaked the code slightly so that there are no NULL values as I was thinking the exact same thing.
I'm currently working on getting rid of the temp tables altogether and wrapping it all into one select.
Unfortunately the original developers of the database, in their wisdom, split one entity into two different log tables so I need to join / merge my log data set and apply a row number per modified date
For now, using the temp tables isn't too bad, 60k rows being returned in less than 20 seconds which is a huge improvement
November 1, 2016 at 11:17 am
Properly clustering should still allow a merge join, which should be much faster than anything you're having to do now.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
November 1, 2016 at 11:22 am
As I have 2 log tables to pull data from, how would this work?
do you mean by just having a clustered index by the modified date on both log tables that the select will still run quicker?
November 1, 2016 at 12:05 pm
mitzyturbo (11/1/2016)
As I have 2 log tables to pull data from, how would this work?do you mean by just having a clustered index by the modified date on both log tables that the select will still run quicker?
Yes, exactly. I would think the "date added" would be sequential on each table, so fragmentation would not be an issue. [But some minor fragmentation is not that big a deal anyway.] Be sure to review the fillfactor on the table also, and make sure it is at least 98 or 99, assuming since this is a history table that you don't modify rows.
But by being clustered on the date, SQL would "know" the rows were in order, and it could do a very efficient join on the two tables, including to a prior row, if needed.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply