September 19, 2013 at 7:31 pm
I have inherited this query that is comparing data from one table to another. It is using a combination of columns to "build" a unique column that gets inserted into the other table should it not exist. The query is below INSERT INTO ReportRecipients
SELECT rs.[ReportID]
,rs.[ATID]
,rs.[AID]
,rs.[RecipientId]
,rs.[Source]
,rs.[Frequency]
,rs.[DeliveryMethod]
,'NEW RECORD'
,CONVERT(VARCHAR(10),rs.[ATID])
+ CONVERT(VARCHAR(10),rs.[AID])
+ CONVERT(VARCHAR(10),rs.[ReportID])
+ CONVERT(VARCHAR(10),rs.[Frequency])
+ CONVERT(VARCHAR(10),rs.[RecipientId])
+ CONVERT(VARCHAR(10),rs.[DeliveryMethod])
,rs.[scheduleid]
,rs.FeeAmount
,rs.FeeTranCode
,0
FROM [dbo].[ReportSchedules] rs
WHERE CONVERT(VARCHAR(10),rs.[ATID])
+ CONVERT(VARCHAR(10),[AID])
+ CONVERT(VARCHAR(10),rs.[ReportID])
+ CONVERT(VARCHAR(10),rs.[Frequency])
+ CONVERT(VARCHAR(10),rs.[RecipientId])
+ CONVERT(VARCHAR(10),rs.[DeliveryMethod])
NOT IN (SELECT DISTINCT [id] from dbo.ReportRecipients)I realize the WHERE clause poses an issue due to the non-sargable concatenated columns. The problem with this query is that in both tables, there are millions of rows. In it's current state it takes about 2 minutes to run, generating 4 million reads.
The "CONVERT(VARCHAR(10),[ATID]) + CONVERT(VARCHAR(10),[AID]) + CONVERT(VARCHAR(10),[ReportID]) + CONVERT(VARCHAR(10),[Frequency]) + CONVERT(VARCHAR(10),a.[RecipientId]) + CONVERT(VARCHAR(10),a.[DeliveryMethod])" business creates the "ID" column in the other table...which when new rows are added into the main table, it uses the combination of the CONVERTS to match against the "ID" column to determine if it's a new record or not.
Here's some sample data and DDL:
/* Create Main source table */
CREATE TABLE [dbo].[ReportSchedules](
[ScheduleId] [int] NOT NULL,
[ReportID] [int] NULL,
[ATID] [int] NULL,
[AID] [int] NULL,
[RecipientId] [int] NULL,
[Source] [varchar](5) NULL,
[Frequency] [tinyint] NULL,
[DeliveryMethod] [tinyint] NULL,
[FaxTime] [varchar](50) NULL,
[FeeAmount] [money] NULL,
[FeeTranCode] [char](5) NULL,
CONSTRAINT [csPk_ReportSchedules] PRIMARY KEY CLUSTERED
( [ScheduleId] ASC ) ON [PRIMARY]
) ON [PRIMARY]
GO
/* Populate Main source table with some data */
INSERT INTO ReportSchedules
SELECT 1,34,51,7301,7481,'CI',2,1,'Not Specified',NULL,NULL UNION ALL
SELECT 2,34,51,7271,7451,'CI',2,1,'Not Specified',NULL,NULL UNION ALL
SELECT 3,24,51,7302,7482,'CI',2,1,'Not Specified',NULL,NULL UNION ALL
SELECT 4,24,51,7273,7453,'CI',2,1,'Not Specified',NULL,NULL UNION ALL
SELECT 5,24,51,9905,10395,'CI',2,1,'Not Specified',NULL,NULL
/* Create table to compare against source table */
CREATE TABLE [dbo].[ReportRecipients](
[ScheduleID] [bigint] IDENTITY(1,1) NOT NULL,
[ReportID] [int] NOT NULL,
[ATID] [int] NOT NULL,
[AID] [int] NOT NULL,
[RecipientId] [int] NOT NULL,
[Source] [varchar](5) NOT NULL,
[Frequency] [tinyint] NOT NULL,
[DeliveryMethod] [tinyint] NOT NULL,
[FaxTime] [varchar](40) NULL,
[ID] [varchar](60) NOT NULL,
[CCScheduleID] [varchar](10) NULL,
[FeeAmount] [money] NULL
) ON [PRIMARY]
ALTER TABLE [dbo].[ReportRecipients] ADD [FeeTranCode] [char](6) NULL
ALTER TABLE [dbo].[ReportRecipients] ADD [FeeFired] [bit] NULL
PRIMARY KEY CLUSTERED
(
[ScheduleID] ASC
) ON [PRIMARY]
GO
/* Insert sample data */
INSERT INTO ReportRecipients
SELECT 34,51,7301,7481,'CI',2,1,'Not Specified',51730134274811,6229,NULL,NULL,0 UNION ALL
SELECT 34,51,7271,7451,'CI',2,1,'Not Specified',51727134274511,6231,NULL,NULL,0 UNION ALL
SELECT 24,51,7302,7482,'CI',2,1,'Not Specified',51730224274821,6234,NULL,NULL,0 UNION ALL
SELECT 24,51,7273,7453,'CI',2,1,'Not Specified',51727324274531,6235,NULL,NULL,0
This will set up the example so there is a new record to insert into the ReportRecipients table when the first query (the first SQL above) is run
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
September 23, 2013 at 4:28 am
WITH cte_insert_report_recipients
AS
(
select a.*,CONVERT(VARCHAR(10),a.[ATID])
+ CONVERT(VARCHAR(10),[AID])
+ CONVERT(VARCHAR(10),a.[ReportID])
+ CONVERT(VARCHAR(10),a.[Frequency])
+ CONVERT(VARCHAR(10),a.[RecipientId])
+ CONVERT(VARCHAR(10),a.[DeliveryMethod]) [ID] from ReportSchedules a
)
insert into ReportRecipients
select
b.ReportID,
b.ATID,
b.AID,
b.RecipientId,
b.Source,
b.Frequency,
b.DeliveryMethod,
'NEW RECORD',
CONVERT(VARCHAR(10),b.[ATID]) + CONVERT(VARCHAR(10),b.[AID])
+ CONVERT(VARCHAR(10),b.[ReportID])
+ CONVERT(VARCHAR(10),b.[Frequency])
+ CONVERT(VARCHAR(10),b.[RecipientId])
+ CONVERT(VARCHAR(10),b.[DeliveryMethod]),
b.ScheduleId,b.FeeAmount,b.FeeTranCode,0
from cte_insert_report_recipients b
where b.ID not in (SELECT [id] from dbo.ReportRecipients)
This should help.
September 23, 2013 at 4:40 am
WITH cte_insert_report_recipients
AS
(
select a.*,CONVERT(VARCHAR(10),a.[ATID])
+ CONVERT(VARCHAR(10),[AID])
+ CONVERT(VARCHAR(10),a.[ReportID])
+ CONVERT(VARCHAR(10),a.[Frequency])
+ CONVERT(VARCHAR(10),a.[RecipientId])
+ CONVERT(VARCHAR(10),a.[DeliveryMethod]) [ID] from ReportSchedules a
)
insert into ReportRecipients
select
b.ReportID,
b.ATID,
b.AID,
b.RecipientId,
b.Source,
b.Frequency,
b.DeliveryMethod,
'NEW RECORD',
CONVERT(VARCHAR(10),b.[ATID]) + CONVERT(VARCHAR(10),b.[AID])
+ CONVERT(VARCHAR(10),b.[ReportID])
+ CONVERT(VARCHAR(10),b.[Frequency])
+ CONVERT(VARCHAR(10),b.[RecipientId])
+ CONVERT(VARCHAR(10),b.[DeliveryMethod]),
b.ScheduleId,b.FeeAmount,b.FeeTranCode,0
from cte_insert_report_recipients b
left outer join dbo.ReportRecipients c
on b.ID = c.ID
where c.ID is null
This should perform even better.
September 23, 2013 at 5:14 am
2 minutes doesn't sound too bad considering it's performing so many scans on your ReportRecipients table!
The only alternative I can think of (and I'm not even convinced it will speed things up) is to create an index on the ID column in your target table, create a temp table for your source which includes the composite key, index it then perform the join.
---------------------------------------------------------
It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens[/url]
Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
September 23, 2013 at 8:26 am
2 possible solutions:
1. Computed Column in dbo.ReportSchedules with an index that you the use in your comparison. With the sample data it doesn't get used but with a larger set of data it might.
2. Similar to above create an indexed view that has the expression as a column with an index. I didn't try this.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
September 23, 2013 at 12:29 pm
ksatpute123 (9/23/2013)
WITH cte_insert_report_recipients
AS
(
select a.*,CONVERT(VARCHAR(10),a.[ATID])
+ CONVERT(VARCHAR(10),[AID])
+ CONVERT(VARCHAR(10),a.[ReportID])
+ CONVERT(VARCHAR(10),a.[Frequency])
+ CONVERT(VARCHAR(10),a.[RecipientId])
+ CONVERT(VARCHAR(10),a.[DeliveryMethod]) [ID] from ReportSchedules a
)
insert into ReportRecipients
select
b.ReportID,
b.ATID,
b.AID,
b.RecipientId,
b.Source,
b.Frequency,
b.DeliveryMethod,
'NEW RECORD',
CONVERT(VARCHAR(10),b.[ATID]) + CONVERT(VARCHAR(10),b.[AID])
+ CONVERT(VARCHAR(10),b.[ReportID])
+ CONVERT(VARCHAR(10),b.[Frequency])
+ CONVERT(VARCHAR(10),b.[RecipientId])
+ CONVERT(VARCHAR(10),b.[DeliveryMethod]),
b.ScheduleId,b.FeeAmount,b.FeeTranCode,0
from cte_insert_report_recipients b
left outer join dbo.ReportRecipients c
on b.ID = c.ID
where c.ID is null
This should perform even better.
I design my code much like this and has performed very well for me, especially in high volume, high traffic processing.
When seeing a sub-select in a where clause it just makes the hair on the back of my neck stand up.... :w00t:
Kurt
Kurt W. Zimmerman
SR DBA
Lefrak Organization
New York, NY
http://www.linkedin.com/in/kurtwzimmerman
September 23, 2013 at 2:15 pm
I tested the result for both the queries. The result was with not much difference taking the same time. This was due to the volume of data. At high volumes the joins perform better in most cases so, I will choose that. Of course in this case there is need for indexing as mentioned in previous comments.
September 23, 2013 at 6:32 pm
How about one of these 2 options? At least they don't do the CONVERTs twice.
INSERT INTO ReportRecipients
SELECT rs.[ReportID]
,rs.[ATID]
,rs.[AID]
,rs.[RecipientId]
,rs.[Source]
,rs.[Frequency]
,rs.[DeliveryMethod]
,'NEW RECORD'
,newkey
,rs.[scheduleid]
,rs.FeeAmount
,rs.FeeTranCode
,0
FROM [dbo].[ReportSchedules] rs
CROSS APPLY (
SELECT CONVERT(VARCHAR(10),rs.[ATID])
+ CONVERT(VARCHAR(10),rs.[AID])
+ CONVERT(VARCHAR(10),rs.[ReportID])
+ CONVERT(VARCHAR(10),rs.[Frequency])
+ CONVERT(VARCHAR(10),rs.[RecipientId])
+ CONVERT(VARCHAR(10),rs.[DeliveryMethod])) a(newkey)
WHERE a.newkey NOT IN (SELECT [id] from dbo.ReportRecipients);
INSERT INTO ReportRecipients
SELECT rs.[ReportID]
,rs.[ATID]
,rs.[AID]
,rs.[RecipientId]
,rs.[Source]
,rs.[Frequency]
,rs.[DeliveryMethod]
,'NEW RECORD'
,newkey
,rs.[scheduleid]
,rs.FeeAmount
,rs.FeeTranCode
,0
FROM [dbo].[ReportSchedules] rs
CROSS APPLY (
SELECT CONVERT(VARCHAR(10),rs.[ATID])
+ CONVERT(VARCHAR(10),rs.[AID])
+ CONVERT(VARCHAR(10),rs.[ReportID])
+ CONVERT(VARCHAR(10),rs.[Frequency])
+ CONVERT(VARCHAR(10),rs.[RecipientId])
+ CONVERT(VARCHAR(10),rs.[DeliveryMethod])) a(newkey)
LEFT JOIN dbo.ReportRecipients c ON c.[id] = a.newkey
WHERE c.[id] IS NULL;
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
September 23, 2013 at 9:20 pm
Thanks for all the GREAT responses everyone. I had actually tweaked an existing index to get the query down to a few seconds, and at one point attempted the CROSS APPLY but wasn't getting the results that I was expecting (but I am no TSQL guru) - I have been absolutely overwhelmed with some production issues since late last night but will get around to trying out some of these suggestions on the "large" tables to see which produces the best result.
Much appreciate it, I'll post back when I can determine which method works best!
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
September 24, 2013 at 1:43 am
dwain.c (9/23/2013)
How about one of these 2 options? At least they don't do the CONVERTs twice.
INSERT INTO ReportRecipients
SELECT rs.[ReportID]
,rs.[ATID]
,rs.[AID]
,rs.[RecipientId]
,rs.[Source]
,rs.[Frequency]
,rs.[DeliveryMethod]
,'NEW RECORD'
,newkey
,rs.[scheduleid]
,rs.FeeAmount
,rs.FeeTranCode
,0
FROM [dbo].[ReportSchedules] rs
CROSS APPLY (
SELECT CONVERT(VARCHAR(10),rs.[ATID])
+ CONVERT(VARCHAR(10),rs.[AID])
+ CONVERT(VARCHAR(10),rs.[ReportID])
+ CONVERT(VARCHAR(10),rs.[Frequency])
+ CONVERT(VARCHAR(10),rs.[RecipientId])
+ CONVERT(VARCHAR(10),rs.[DeliveryMethod])) a(newkey)
WHERE a.newkey NOT IN (SELECT [id] from dbo.ReportRecipients);
INSERT INTO ReportRecipients
SELECT rs.[ReportID]
,rs.[ATID]
,rs.[AID]
,rs.[RecipientId]
,rs.[Source]
,rs.[Frequency]
,rs.[DeliveryMethod]
,'NEW RECORD'
,newkey
,rs.[scheduleid]
,rs.FeeAmount
,rs.FeeTranCode
,0
FROM [dbo].[ReportSchedules] rs
CROSS APPLY (
SELECT CONVERT(VARCHAR(10),rs.[ATID])
+ CONVERT(VARCHAR(10),rs.[AID])
+ CONVERT(VARCHAR(10),rs.[ReportID])
+ CONVERT(VARCHAR(10),rs.[Frequency])
+ CONVERT(VARCHAR(10),rs.[RecipientId])
+ CONVERT(VARCHAR(10),rs.[DeliveryMethod])) a(newkey)
LEFT JOIN dbo.ReportRecipients c ON c.[id] = a.newkey
WHERE c.[id] IS NULL;
Nice. +1
September 24, 2013 at 7:43 am
I like door #2 better with the Left Join.... 😉
Kurt
Kurt W. Zimmerman
SR DBA
Lefrak Organization
New York, NY
http://www.linkedin.com/in/kurtwzimmerman
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply