Better way to write this query

Question

Post reply

Better way to write this query

MyDoggieJessie

SSC-Forever

Points: 44486
More actions
September 19, 2013 at 7:31 pm

#283434

I have inherited this query that is comparing data from one table to another. It is using a combination of columns to "build" a unique column that gets inserted into the other table should it not exist. The query is below INSERT INTO ReportRecipients
SELECT rs.[ReportID]
,rs.[ATID]
,rs.[AID]
,rs.[RecipientId]
,rs.[Source]
,rs.[Frequency]
,rs.[DeliveryMethod]
,'NEW RECORD'
,CONVERT(VARCHAR(10),rs.[ATID])
+ CONVERT(VARCHAR(10),rs.[AID])
+ CONVERT(VARCHAR(10),rs.[ReportID])
+ CONVERT(VARCHAR(10),rs.[Frequency])
+ CONVERT(VARCHAR(10),rs.[RecipientId])
+ CONVERT(VARCHAR(10),rs.[DeliveryMethod])
,rs.[scheduleid]
,rs.FeeAmount
,rs.FeeTranCode
,0
FROM [dbo].[ReportSchedules] rs
WHERE CONVERT(VARCHAR(10),rs.[ATID])
+ CONVERT(VARCHAR(10),[AID])
+ CONVERT(VARCHAR(10),rs.[ReportID])
+ CONVERT(VARCHAR(10),rs.[Frequency])
+ CONVERT(VARCHAR(10),rs.[RecipientId])
+ CONVERT(VARCHAR(10),rs.[DeliveryMethod])
NOT IN (SELECT DISTINCT [id] from dbo.ReportRecipients)I realize the WHERE clause poses an issue due to the non-sargable concatenated columns. The problem with this query is that in both tables, there are millions of rows. In it's current state it takes about 2 minutes to run, generating 4 million reads.
The "CONVERT(VARCHAR(10),[ATID]) + CONVERT(VARCHAR(10),[AID]) + CONVERT(VARCHAR(10),[ReportID]) + CONVERT(VARCHAR(10),[Frequency]) + CONVERT(VARCHAR(10),a.[RecipientId]) + CONVERT(VARCHAR(10),a.[DeliveryMethod])" business creates the "ID" column in the other table...which when new rows are added into the main table, it uses the combination of the CONVERTS to match against the "ID" column to determine if it's a new record or not.
Here's some sample data and DDL:
/* Create Main source table */
CREATE TABLE [dbo].[ReportSchedules](
[ScheduleId] [int] NOT NULL,
[ReportID] [int] NULL,
[ATID] [int] NULL,
[AID] [int] NULL,
[RecipientId] [int] NULL,
[Source] [varchar](5) NULL,
[Frequency] [tinyint] NULL,
[DeliveryMethod] [tinyint] NULL,
[FaxTime] [varchar](50) NULL,
[FeeAmount] [money] NULL,
[FeeTranCode] [char](5) NULL,
CONSTRAINT [csPk_ReportSchedules] PRIMARY KEY CLUSTERED
( [ScheduleId] ASC ) ON [PRIMARY]
) ON [PRIMARY]
GO
/* Populate Main source table with some data */
INSERT INTO ReportSchedules
SELECT 1,34,51,7301,7481,'CI',2,1,'Not Specified',NULL,NULL UNION ALL
SELECT 2,34,51,7271,7451,'CI',2,1,'Not Specified',NULL,NULL UNION ALL
SELECT 3,24,51,7302,7482,'CI',2,1,'Not Specified',NULL,NULL UNION ALL
SELECT 4,24,51,7273,7453,'CI',2,1,'Not Specified',NULL,NULL UNION ALL
SELECT 5,24,51,9905,10395,'CI',2,1,'Not Specified',NULL,NULL
/* Create table to compare against source table */
CREATE TABLE [dbo].[ReportRecipients](
[ScheduleID] [bigint] IDENTITY(1,1) NOT NULL,
[ReportID] [int] NOT NULL,
[ATID] [int] NOT NULL,
[AID] [int] NOT NULL,
[RecipientId] [int] NOT NULL,
[Source] [varchar](5) NOT NULL,
[Frequency] [tinyint] NOT NULL,
[DeliveryMethod] [tinyint] NOT NULL,
[FaxTime] [varchar](40) NULL,
[ID] [varchar](60) NOT NULL,
[CCScheduleID] [varchar](10) NULL,
[FeeAmount] [money] NULL
) ON [PRIMARY]
ALTER TABLE [dbo].[ReportRecipients] ADD [FeeTranCode] [char](6) NULL
ALTER TABLE [dbo].[ReportRecipients] ADD [FeeFired] [bit] NULL
PRIMARY KEY CLUSTERED
(
[ScheduleID] ASC
) ON [PRIMARY]
GO
/* Insert sample data */
INSERT INTO ReportRecipients
SELECT 34,51,7301,7481,'CI',2,1,'Not Specified',51730134274811,6229,NULL,NULL,0 UNION ALL
SELECT 34,51,7271,7451,'CI',2,1,'Not Specified',51727134274511,6231,NULL,NULL,0 UNION ALL
SELECT 24,51,7302,7482,'CI',2,1,'Not Specified',51730224274821,6234,NULL,NULL,0 UNION ALL
SELECT 24,51,7273,7453,'CI',2,1,'Not Specified',51727324274531,6235,NULL,NULL,0
This will set up the example so there is a new record to insert into the ReportRecipients table when the first query (the first SQL above) is run
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

ksatpute123 Hall of Fame Points: 3325 More actions · Answer 1

WITH cte_insert_report_recipients

AS

(

select a.*,CONVERT(VARCHAR(10),a.[ATID])

+ CONVERT(VARCHAR(10),[AID])

+ CONVERT(VARCHAR(10),a.[ReportID])

+ CONVERT(VARCHAR(10),a.[Frequency])

+ CONVERT(VARCHAR(10),a.[RecipientId])

+ CONVERT(VARCHAR(10),a.[DeliveryMethod]) [ID] from ReportSchedules a

)

insert into ReportRecipients

select

b.ReportID,

b.ATID,

b.AID,

b.RecipientId,

b.Source,

b.Frequency,

b.DeliveryMethod,

'NEW RECORD',

CONVERT(VARCHAR(10),b.[ATID]) + CONVERT(VARCHAR(10),b.[AID])

+ CONVERT(VARCHAR(10),b.[ReportID])

+ CONVERT(VARCHAR(10),b.[Frequency])

+ CONVERT(VARCHAR(10),b.[RecipientId])

+ CONVERT(VARCHAR(10),b.[DeliveryMethod]),

b.ScheduleId,b.FeeAmount,b.FeeTranCode,0

from cte_insert_report_recipients b

where b.ID not in (SELECT [id] from dbo.ReportRecipients)

This should help.

ksatpute123 Hall of Fame Points: 3325 More actions · Answer 2

WITH cte_insert_report_recipients

AS

(

select a.*,CONVERT(VARCHAR(10),a.[ATID])

+ CONVERT(VARCHAR(10),[AID])

+ CONVERT(VARCHAR(10),a.[ReportID])

+ CONVERT(VARCHAR(10),a.[Frequency])

+ CONVERT(VARCHAR(10),a.[RecipientId])

+ CONVERT(VARCHAR(10),a.[DeliveryMethod]) [ID] from ReportSchedules a

)

insert into ReportRecipients

select

b.ReportID,

b.ATID,

b.AID,

b.RecipientId,

b.Source,

b.Frequency,

b.DeliveryMethod,

'NEW RECORD',

CONVERT(VARCHAR(10),b.[ATID]) + CONVERT(VARCHAR(10),b.[AID])

+ CONVERT(VARCHAR(10),b.[ReportID])

+ CONVERT(VARCHAR(10),b.[Frequency])

+ CONVERT(VARCHAR(10),b.[RecipientId])

+ CONVERT(VARCHAR(10),b.[DeliveryMethod]),

b.ScheduleId,b.FeeAmount,b.FeeTranCode,0

from cte_insert_report_recipients b

left outer join dbo.ReportRecipients c

on b.ID = c.ID

where c.ID is null

This should perform even better.

Abu Dina SSChampion Points: 14155 More actions · Answer 3

2 minutes doesn't sound too bad considering it's performing so many scans on your ReportRecipients table!

The only alternative I can think of (and I'm not even convinced it will speed things up) is to create an index on the ID column in your target table, create a temp table for your source which includes the composite key, index it then perform the join.

---------------------------------------------------------

It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens[/url]

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn

Jack Corbett SSC Guru Points: 184394 More actions · Answer 4

2 possible solutions:

1. Computed Column in dbo.ReportSchedules with an index that you the use in your comparison. With the sample data it doesn't get used but with a larger set of data it might.

2. Similar to above create an indexed view that has the expression as a column with an index. I didn't try this.

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

Kurt W. Zimmerman SSCrazy Eights Points: 8676 More actions · Answer 5

ksatpute123 (9/23/2013)
WITH cte_insert_report_recipients AS ( select a.*,CONVERT(VARCHAR(10),a.[ATID]) + CONVERT(VARCHAR(10),[AID]) + CONVERT(VARCHAR(10),a.[ReportID]) + CONVERT(VARCHAR(10),a.[Frequency]) + CONVERT(VARCHAR(10),a.[RecipientId]) + CONVERT(VARCHAR(10),a.[DeliveryMethod]) [ID] from ReportSchedules a ) insert into ReportRecipients select b.ReportID, b.ATID, b.AID, b.RecipientId, b.Source, b.Frequency, b.DeliveryMethod, 'NEW RECORD', CONVERT(VARCHAR(10),b.[ATID]) + CONVERT(VARCHAR(10),b.[AID]) + CONVERT(VARCHAR(10),b.[ReportID]) + CONVERT(VARCHAR(10),b.[Frequency]) + CONVERT(VARCHAR(10),b.[RecipientId]) + CONVERT(VARCHAR(10),b.[DeliveryMethod]), b.ScheduleId,b.FeeAmount,b.FeeTranCode,0 from cte_insert_report_recipients b left outer join dbo.ReportRecipients c on b.ID = c.ID where c.ID is null
This should perform even better.

I design my code much like this and has performed very well for me, especially in high volume, high traffic processing.

When seeing a sub-select in a where clause it just makes the hair on the back of my neck stand up.... :w00t:

Kurt

Kurt W. Zimmerman
SR DBA
Lefrak Organization
New York, NY

http://www.linkedin.com/in/kurtwzimmerman

ksatpute123 Hall of Fame Points: 3325 More actions · Answer 6

I tested the result for both the queries. The result was with not much difference taking the same time. This was due to the volume of data. At high volumes the joins perform better in most cases so, I will choose that. Of course in this case there is need for indexing as mentioned in previous comments.

Dwain Camps SSC Guru Points: 86978 More actions · Answer 7

How about one of these 2 options? At least they don't do the CONVERTs twice.

INSERT INTO ReportRecipients

SELECT rs.[ReportID]

,rs.[ATID]

,rs.[AID]

,rs.[RecipientId]

,rs.[Source]

,rs.[Frequency]

,rs.[DeliveryMethod]

,'NEW RECORD'

,newkey

,rs.[scheduleid]

,rs.FeeAmount

,rs.FeeTranCode

,0

FROM [dbo].[ReportSchedules] rs

CROSS APPLY (

SELECT CONVERT(VARCHAR(10),rs.[ATID])

+ CONVERT(VARCHAR(10),rs.[AID])

+ CONVERT(VARCHAR(10),rs.[ReportID])

+ CONVERT(VARCHAR(10),rs.[Frequency])

+ CONVERT(VARCHAR(10),rs.[RecipientId])

+ CONVERT(VARCHAR(10),rs.[DeliveryMethod])) a(newkey)

WHERE a.newkey NOT IN (SELECT [id] from dbo.ReportRecipients);

INSERT INTO ReportRecipients

SELECT rs.[ReportID]

,rs.[ATID]

,rs.[AID]

,rs.[RecipientId]

,rs.[Source]

,rs.[Frequency]

,rs.[DeliveryMethod]

,'NEW RECORD'

,newkey

,rs.[scheduleid]

,rs.FeeAmount

,rs.FeeTranCode

,0

FROM [dbo].[ReportSchedules] rs

CROSS APPLY (

SELECT CONVERT(VARCHAR(10),rs.[ATID])

+ CONVERT(VARCHAR(10),rs.[AID])

+ CONVERT(VARCHAR(10),rs.[ReportID])

+ CONVERT(VARCHAR(10),rs.[Frequency])

+ CONVERT(VARCHAR(10),rs.[RecipientId])

+ CONVERT(VARCHAR(10),rs.[DeliveryMethod])) a(newkey)

LEFT JOIN dbo.ReportRecipients c ON c.[id] = a.newkey

WHERE c.[id] IS NULL;

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

MyDoggieJessie SSC-Forever Points: 44486 More actions · Answer 8

Thanks for all the GREAT responses everyone. I had actually tweaked an existing index to get the query down to a few seconds, and at one point attempted the CROSS APPLY but wasn't getting the results that I was expecting (but I am no TSQL guru) - I have been absolutely overwhelmed with some production issues since late last night but will get around to trying out some of these suggestions on the "large" tables to see which produces the best result.

Much appreciate it, I'll post back when I can determine which method works best!

______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

ksatpute123 Hall of Fame Points: 3325 More actions · Answer 9

dwain.c (9/23/2013)
How about one of these 2 options? At least they don't do the CONVERTs twice.
INSERT INTO ReportRecipients
SELECT rs.[ReportID]
,rs.[ATID]
,rs.[AID]
,rs.[RecipientId]
,rs.[Source]
,rs.[Frequency]
,rs.[DeliveryMethod]
,'NEW RECORD'
,newkey
,rs.[scheduleid]
,rs.FeeAmount
,rs.FeeTranCode
,0
FROM [dbo].[ReportSchedules] rs
CROSS APPLY (
SELECT CONVERT(VARCHAR(10),rs.[ATID])
+ CONVERT(VARCHAR(10),rs.[AID])
+ CONVERT(VARCHAR(10),rs.[ReportID])
+ CONVERT(VARCHAR(10),rs.[Frequency])
+ CONVERT(VARCHAR(10),rs.[RecipientId])
+ CONVERT(VARCHAR(10),rs.[DeliveryMethod])) a(newkey)
WHERE a.newkey NOT IN (SELECT [id] from dbo.ReportRecipients);
INSERT INTO ReportRecipients
SELECT rs.[ReportID]
,rs.[ATID]
,rs.[AID]
,rs.[RecipientId]
,rs.[Source]
,rs.[Frequency]
,rs.[DeliveryMethod]
,'NEW RECORD'
,newkey
,rs.[scheduleid]
,rs.FeeAmount
,rs.FeeTranCode
,0
FROM [dbo].[ReportSchedules] rs
CROSS APPLY (
SELECT CONVERT(VARCHAR(10),rs.[ATID])
+ CONVERT(VARCHAR(10),rs.[AID])
+ CONVERT(VARCHAR(10),rs.[ReportID])
+ CONVERT(VARCHAR(10),rs.[Frequency])
+ CONVERT(VARCHAR(10),rs.[RecipientId])
+ CONVERT(VARCHAR(10),rs.[DeliveryMethod])) a(newkey)
LEFT JOIN dbo.ReportRecipients c ON c.[id] = a.newkey
WHERE c.[id] IS NULL;

Nice. +1

Kurt W. Zimmerman SSCrazy Eights Points: 8676 More actions · Answer 10

I like door #2 better with the Left Join.... 😉

Kurt

Kurt W. Zimmerman
SR DBA
Lefrak Organization
New York, NY

http://www.linkedin.com/in/kurtwzimmerman