July 19, 2018 at 8:39 am
I've a history that got populated through a trigger when some change happened. Trigger had an issue and it was storing records in the history table whether there was an actual upate or not. Trigger is fixed now. Challange is how to clean up this massive table. We need to identify the records that got changed and remove others. Any better way to do this? Since this is a massive table, query is taking exteremly longer to return results.
I'm coming up with query like this
SELECT * FROM tbl_history a
INNER JOIN tbl_history b ON a.ID = b.ID
WHERE a.col1 <> b.col1
OR a.col2 <> b.col2
OR a.col3 <> b.col3
OR a.col3 <> b.col3
OR a.col4 <> b.col4
OR a.col5 <> b.col5
OR a.col6<> b.col6
OR a.col7 <> b.col7
OR a.col8 <> b.col8
OR a.col9 <> b.col9
OR a.col10 <> b.col10
OR a.col11 <> b.col11
July 19, 2018 at 8:55 am
SQL_Surfer - Thursday, July 19, 2018 8:39 AMI've a history that got populated through a trigger when some change happened. Trigger had an issue and it was storing records in the history table whether there was an actual upate or not. Trigger is fixed now. Challange is how to clean up this massive table. We need to identify the records that got changed and remove others. Any better way to do this? Since this is a massive table, query is taking exteremly longer to return results.
I'm coming up with query like thisSELECT * FROM tbl_history a
INNER JOIN tbl_history b ON a.ID = b.ID
WHERE a.col1 <> b.col1
OR a.col2 <> b.col2
OR a.col3 <> b.col3
OR a.col3 <> b.col3
OR a.col4 <> b.col4
OR a.col5 <> b.col5
OR a.col6<> b.col6
OR a.col7 <> b.col7
OR a.col8 <> b.col8
OR a.col9 <> b.col9
OR a.col10 <> b.col10
OR a.col11 <> b.col11
This query will not correctly handle the case where one of the columns being checked is a NULL. A method which does is as follows:SELECT ...
FROM ...
WHERE EXISTS (SELECT a.col1, a.col2, ..., a.col11 EXCEPT SELECT b.col1, b.col2, ..., b.col11)
But it's not going to run fast.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
July 19, 2018 at 9:24 am
Thanks, any idea to speed it up? May be do it in a batch?
July 20, 2018 at 7:54 am
Wouldn't you want to do that the other way? That is, find rows where the previous row matches the next row -- i.e. no column changed -- so you could delete the older of those rows? Thus something like this (?):
WHERE a.col1 = b.col1 AND a.col2 = b.col2 AND a.col3 <> b.col3 ...
As to NULL issues, check the column constraints first: if the column is defined as NOT NULL, obviously you don't have to worry about NULLs for that particular column.
You didn't state it, but presumably ID is the clustering key on this table (not really the best idea of an audit table, but extremely common). If so, you'd update by going in batches thru the table and deleting rows that were dups of the prior row.
As to efficiency, perhaps a PARTITION BY col1, col2, col3, .... would work best, and would automatically take care of the NULL issue. Perhaps with some type of "gaps and islands" logic to determine which rows to delete. I/we would need more details about the table and its data to be sure on that one.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply