January 6, 2015 at 10:56 pm
Hi All,
I am new to Database technology, I have come up with the below SQL query with the help of this Fourm Members.
This SQL query finds duplicate records present in dbo.Changes_DupCk table by comparing it with dbo.Changes and Changes_Frze tables. The three tables are as below.
1. dbo.Changes_DupCk
2. dbo.Changes
3. dbo.Changes_Frze
The work involves is every week, data from Changes_DupCk table has to be loaded into dbo.Changes and Changes_Frze table. But before loading we have to make sure that dbo.Changes_DupCk table has no dupicate record which is already present in one of the other two tables.
A Record in Changes_DupCk becomes duplicate if the requestnumber,suffix,AmountTypeCode, EffDate,ExpDate,Amount,AdjustmentType,AdjustmentStatus,AdjustmentUserId matches with a record in dbo.Changes or dbo.Changes_Frze table
All three have same table structure mentioned below
CREATE TABLE [dbo].[Changes_Frze](
[requestNumber] [char](12) NOT NULL,
[suffix] [char](3) NOT NULL,
[AmountTypeCode] [char](2) NOT NULL,
[EffDate] [datetime2](3) NOT NULL,
[CreationDateTime] [datetime2](3) NOT NULL,
[ExpDate] [datetime2](3) NULL,
[Amount] [money] NULL,
[AdjustmentUserId] [varchar](10) NULL,
[AdjustmentReason] [varchar](255) NULL,
[AdjustmentManagerId] [varchar](10) NULL,
[AdjustmentStatus] [char](1) NOT NULL,
[AdjustmentType] [char](1) NOT NULL,
[AdjustmentReasonId] [smallint] NULL
) ON [PRIMARY]
GO
The SQL query I came up is
WITH
cteGetAllRows AS
( --=== Combine all the rows as if they were from one table using UNION ALL, marking each row with a table #.
SELECT *,TableNo = 1 FROM dbo.Changes_DupCk UNION ALL
SELECT *,TableNo = 2 FROM dbo.Changes UNION ALL
SELECT *,TableNo = 3 FROM dbo.Changes_Frze
)
,
cteMarkDupes AS
( --=== Number all duplicate rows starting the count over for each row according to the PARTITION BY
SELECT DupeCheck = ROW_NUMBER() OVER (PARTITION BY [requestNumber],AmountTypeCode,TransactionEffDate,TransactionExpDate,Amount,AdjustmentType,AdjustmentStatus,AdjustmentUserId ORDER BY TransactionEffDate)
,*
FROM cteGetAllRows
) --=== Return only the duplicates not including the original of each duplicate.
SELECT *
FROM cteMarkDupes
WHERE DupeCheck > 1
;
This query assumes that all three tables might have duplicates. But in reality, only the Changes_DupCk table may contain duplicate record. Duplicate record in this case means that record is already present in one of other two tables. And dbo.Changes has millions of Records which all are unique. So making an UNION ALL will make my database slow, please suggest me how to optimise this query for better performance.
January 7, 2015 at 12:04 am
Hi,
Instead of using UNION ALL you put can use Join and select out the common data from your table and after that you can add a column as a flag which will gets updated as 1 when commom values when matched...
then you can processed with other values whose flag is 0 so that common data will not come into the scenario of duplication
_______________________________________________________________
To get quick answer follow this link:
http://www.sqlservercentral.com/articles/Best+Practices/61537/
January 7, 2015 at 12:28 am
Hi Kapil,
Common data means Duplicate Data or Unique data . please explain How to set Flag over it.
Could you please provide an example on this scenario.
January 7, 2015 at 12:54 am
I have written this query for checking duplicates by matching request no . But its not checking the third inner join for Duplicates, It only checks the first inner join for duplicates.
SELECT * FROM dbo.Changes_DupCk INNER JOIN dbo.Changes ON dbo.FinancialChangesClaims_DupCk.requestNumber=dbo.lChanges.requestNumber INNER JOIN dbo.Changes_ClmFrze ON dbo.Changes_DupCk.requestNumber=dbo.Changes_ClmFrze.requestNumber
Please advise what is the mistake.
January 7, 2015 at 1:45 am
For performance questions, we really need to see table definitions, all index definitions and the actual execution plan (saved as a .sqlplan file and attached)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 7, 2015 at 3:03 am
Hi Gila,
Please check the attached doc containing the table definition and indexes for all the three tables
and advise.
The table "changes" will have millions of data and so is "Changes_DupCk" . Out of the million data in "Changes_DupCk " only few will be actually duplicate, I mean only few records will be already present in "Changes" or "Changes_Frze".
Please optimize my code to tune the database performance as I am using Union ALL in my query which will impact database as it has more unique datas.
January 7, 2015 at 3:16 am
Execution plan?
muralikrishna2489 (1/7/2015)
Please optimize my code to tune the database performance as I am using Union ALL in my query which will impact database as it has more unique datas.
Does it? You've tested and proven that union all is slow on larger resultsets and it's due to the union all?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 7, 2015 at 3:48 am
Hi Gila,
I dont know how to put a execution plan in SQL. I have created in Doc and attached.
Regarding UNION ALL, I was advised that Since both Changes and Changes_DupCK table has millions of data , Giving UNION ALL will impact the database speed.
That is why I am looking for optimizing the code.
Please advise.
January 7, 2015 at 4:07 am
muralikrishna2489 (1/7/2015)
Hi Gila,I dont know how to put a execution plan in SQL. I have created in Doc and attached.
Regarding UNION ALL, I was advised that Since both Changes and Changes_DupCK table has millions of data , Giving UNION ALL will impact the database speed.
That is why I am looking for optimizing the code.
Please advise.
That's not quite an execution plan. You use the key command CTL-M in a query window in SQL SErver Management Studio to enable the capture of an execution plan. Run the query. You'll see a new tab available with a bunch of icons and lines. That's the plan. Right click on it, save as, creating a file with a .sqlplan extension as Gail says.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
January 7, 2015 at 4:12 am
muralikrishna2489 (1/7/2015)
I was advised that Since both Changes and Changes_DupCK table has millions of data , Giving UNION ALL will impact the database speed.
May I suggest you read through this: http://www.sqlservercentral.com/articles/Performance+Tuning/115825/
Is the query slow?
Has it been tested with production volumes of data? If so, is it unacceptably slow?
Unless you can answer both questions with 'Yes', this line of inquiry may be a waste of time.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 7, 2015 at 4:38 am
Hi Gila,
I have not tested with Production data, I will test it and let you know the status shortly.
Thanks Again.
January 8, 2015 at 8:33 am
I wonder if MERGE statement can be used in this case. would something like this improves the performance
Merge [dbo].[Changes] T1
using
[dbo].[Changes_DupCk] S
on
T1.requestnumber = S.requestnumber AND
T1.suffix = S.suffix AND
T1.AmountTypeCode = S.AmountTypeCode AND
T1.EffDate = S.EffDate AND
T1.ExpDate = S.ExpDate AND
T1.Amount= S.Amount AND
T1.AdjustmentType= S.AdjustmentType AND
T1.AdjustmentStatus = S.AdjustmentStatus AND
T1.AdjustmentUserId = S.AdjustmentUserId
WHEN NOT MATCHED THEN
INSERT (requestnumber,suffix,AmountTypeCode, EffDate,CreationDateTime,ExpDate,Amount,AdjustmentUserId,
AdjustmentReason,AdjustmentManagerId,AdjustmentStatus,AdjustmentType,AdjustmentReasonId)
VALUES(S.requestnumber,S.suffix,S.AmountTypeCode,S.EffDate,S.CreationDateTime,S.ExpDate,S.Amount,S.AdjustmentUserId,S.AdjustmentReason,S.AdjustmentManagerId,S.AdjustmentStatus,S.AdjustmentType,S.AdjustmentReasonId)
January 8, 2015 at 8:36 am
I wonder if MERGE statement can be used in this case. This script is only for [Changes] table . If it is performance effective we can implement the same for [Changes_Frze].
Merge [dbo].[Changes] T1
using
[dbo].[Changes_DupCk] S
on
T1.requestnumber = S.requestnumber AND
T1.suffix = S.suffix AND
T1.AmountTypeCode = S.AmountTypeCode AND
T1.EffDate = S.EffDate AND
T1.ExpDate = S.ExpDate AND
T1.Amount= S.Amount AND
T1.AdjustmentType= S.AdjustmentType AND
T1.AdjustmentStatus = S.AdjustmentStatus AND
T1.AdjustmentUserId = S.AdjustmentUserId
WHEN NOT MATCHED THEN
INSERT (requestnumber,suffix,AmountTypeCode, EffDate,CreationDateTime,ExpDate,Amount,AdjustmentUserId,
AdjustmentReason,AdjustmentManagerId,AdjustmentStatus,AdjustmentType,AdjustmentReasonId)
VALUES(S.requestnumber,S.suffix,S.AmountTypeCode,S.EffDate,S.CreationDateTime,S.ExpDate,S.Amount,S.AdjustmentUserId,S.AdjustmentReason,S.AdjustmentManagerId,S.AdjustmentStatus,S.AdjustmentType,S.AdjustmentReasonId)
January 8, 2015 at 11:57 am
Hi Laxmi,
Thanks for the suggestion.
I am not inserting anything into the table, I just need to Find the count of duplicates. Please tell why I need to use insert statements.
January 9, 2015 at 7:34 am
If it is only about finding the count of duplicate records then this query would work.
CREATE TABLE #DuplicateRecordFinder(
[requestNumCer] [char](12) NOT NULL,
[suffix] [char](3) NOT NULL,
[AmountTypeCode] [char](2) NOT NULL,
[EffDate] [datetime2](3) NOT NULL,
[CreationDateTime] [datetime2](3) NOT NULL,
[ExpDate] [datetime2](3) NULL,
[Amount] [money] NULL,
[AdjustmentUserId] [varchar](10) NULL,
[AdjustmentReason] [varchar](255) NULL,
[AdjustmentManagerId] [varchar](10) NULL,
[AdjustmentStatus] [char](1) NOT NULL,
[AdjustmentType] [char](1) NOT NULL,
[AdjustmentReasonId] [smallint] NULL
)
INSERT INTO #DuplicateRecordFinder
(
select Changes_DupCk.* from Changes_DupCk A join Changes_Frze B
on A.requestnumBer=B.requestnumBer and A.suffix =B.suffix and
A.AmountTypeBode =B.AmountTypeBode and A.EffDate =B.EffDate and
A.ExpDate =B.ExpDate and A.Amount =B.Amount and A.AdjustmentType =B.AdjustmentType and
A.AdjustmentStatus =B.AdjustmentStatus and A.AdjustmentUserId =B.AdjustmentUserId
union all
select Changes_DupCk.* from Changes_DupCk A join Changes C
on A.requestnumCer=C.requestnumCer and A.suffix =C.suffix and
A.AmountTypeCode =C.AmountTypeCode and A.EffDate =C.EffDate and
A.ExpDate =C.ExpDate and A.Amount =C.Amount and A.AdjustmentType =C.AdjustmentType and
A.AdjustmentStatus =C.AdjustmentStatus and A.AdjustmentUserId =C.AdjustmentUserId
)
select count(*) from #DuplicateRecordFinder
drop table #DuplicateRecordFinder
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply