November 2, 2015 at 4:49 am
Hi All,
I am having a main table and temptable.
Every sunday, new data will be loaded from temptable to main table. I have to make sure that, duplicates does not get loaded from temptable to maintable.
For example, if last sunday a record gets loaded from temp to main. If this sunday also the same record is present then it means that is a duplicate.
The duplicate is decided on below scenario
select 'CodeChanges: ', count(*) from CodeChanges a, CodeChanges_Temp b
where a.AccountNumber = b.AccountNumber and
a.HexaNumber = b.HexaNumber and
a.HexaEffDate = b.HexaEffDate and
a.HexaId = b.HexaId and
a.TransactionEffDate = b.TransactionEffDate and
isnull(a.TransactionExpDate,getdate()) = isnull(b.TransactionExpDate,getdate()) and
a.HexaCode = b.HexaCode and
and a.CompanyCode = b.CompanyCode
a.CategoryCode = b.CategoryCode and
a.AdjustmentUserId = b.AdjustmentUserId
Yesterday (Sunday) , data from temp got loaded onto maintable but with duplicates.
There is a log which just displays number of duplicates.
Yesterday the log displayed 8 duplicates found. I need to find out the 8 duplicates which got loaded yesterday and delete it off from main table.
There is a column in both tables which is 'creation date and time'. Every Sunday when the load happens this column will have that day's date .
Now i need to find out what are all the duplicates which got loaded on this sunday.
The total rows in temp table is : 363
No of duplicates present is : 8
I used below query to find out the duplicates but it is returning all the 363 rows from the maintable instead of the 8 duplicates.
Select 'CodeChanges: ', * from CodeChanges a where
exists
( Select 1 from CodeChanges_Temp b where
a.HexaNumber = b.HexaNumber and
a.HexaEffDate = b.HexaEffDate and
a.HexaId = b.HexaId and
a.TransactionEffDate = b.TransactionEffDate and
isnull(a.TransactionExpDate,getdate()) = isnull(b.TransactionExpDate,getdate()) and
a.HexaCode = b.HexaCode and
and a.CompanyCode = b.CompanyCode
a.CategoryCode = b.CategoryCode and
a.AdjustmentUserId = b.AdjustmentUserId
b.CreationDateTime = '2015-11-01 00:00:00.000'
)
and a.CreationDateTime = '2015-11-01 00:00:00.000'
Please help me in finding the duplicate records which has creation date time as '2015-11-01 00:00:00.000'
and all the above columns mentioned in the query matches.
Example
Few colums only metioned below
creationdateandtime HexaNumber HexaCode
1. 1987-10-01 00:00:00.000 5 3
2. 2015-11-01 00:00:00.000 5 3
So here the second record is duplicate. This is what I am trying to find.
November 2, 2015 at 5:04 am
A query like this
SELECT
rn = ROW_NUMBER() OVER(PARTITION BY AccountNumber, HexaNumber, HexaEffDate, HexaId, TransactionEffDate, TransactionExpDate, HexaCode, CompanyCode, CategoryCode, AdjustmentUserId
ORDER BY (SELECT NULL)),
*
FROM CodeChanges
should identify your dupes. You might have to adjust the column list in the PARTITION BY clause. When you are sure that the column list is correct, "primary" values have rn = 1 and "duplicate" values have rn > 1.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
November 2, 2015 at 5:50 am
Hi Chris,
Thanks very much for the advise !
I have one doubt, please clarify on the below quotes,
You might have to adjust the column list in the PARTITION BY clause
You mean if I have anymore columns to add in the condition, then I need to add, right ?
When you are sure that the column list is correct, "primary" values have rn = 1 and "duplicate" values have rn > 1.
you mean the output will show all the records which is unique as well as duplicate ?
I just need the duplicate records.
The count of CodeChanges table is 588067. I think its very high.
So just wanted to confirm before running the query to avoid any database issue.
Please advise .
November 2, 2015 at 5:54 am
muralikrishna2489 (11/2/2015)
Hi Chris,Thanks very much for the advise !
I have one doubt, please clarify on the below quotes,
You might have to adjust the column list in the PARTITION BY clause
You mean if I have anymore columns to add in the condition, then I need to add, right ?
When you are sure that the column list is correct, "primary" values have rn = 1 and "duplicate" values have rn > 1.
you mean the output will show all the records which is unique as well as duplicate ?
I just need the duplicate records.
The count of CodeChanges table is 588067. I think its very high.
So just wanted to confirm before running the query to avoid any database issue.
Please advise .
The column list in the PARTITION BY clause should exactly match the column list which defines a duplicate in your table.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
November 2, 2015 at 7:18 am
muralikrishna2489 (11/2/2015)
you mean the output will show all the records which is unique as well as duplicate ?I just need the duplicate records.
The count of CodeChanges table is 588067. I think its very high.
So just wanted to confirm before running the query to avoid any database issue.
Please advise .
Take a look at Chris's note below his query. The primary rows will have rn = 1. The duplicates (as defined by the column list in the PARTITION BY) will have rn > 1. To select only your duplicates, simply add the WHERE clause to the outer query:
WHERE rn > 1
FYI: If you're going to delete the duplicate rows, you can delete from a CTE. Instead of SELECTing from the CTE, change it to a DELETE from the CTE. This gives you the opportunity to see what you're going to delete before you delete it. Example:
WITH cte AS (
SELECT rn = ROW_NUMBER() OVER(PARTITION BY AccountNumber, HexaNumber, HexaEffDate, HexaId, TransactionEffDate, TransactionExpDate, HexaCode, CompanyCode, CategoryCode, AdjustmentUserId ORDER BY (SELECT NULL)), *
FROM CodeChanges
)
SELECT *
FROM cte
WHERE rn > 1;
In this code, if you change the "SELECT * FROM" to a "DELETE FROM" you will delete the duplicates.
Please make sure you understand this before running it in production.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply