July 7, 2014 at 8:29 am
Well you can change the column definition in the #contacts temp table to allow NULLs, but this does again question your data quality.
July 7, 2014 at 8:32 am
cbrammer1219 (7/7/2014)
wBob (7/3/2014)
The main problem with this proc is that you reference the CTE twice. SQL Server will process that twice because CTEs behave like views rather than temp tables. So that expensive remote query is seriously costing you. You can see this in the execution plan with the two near-identical looking branches. All the implicit conversion and data-type advice is good, you should sort that out, but you will transform performance of this query by only getting that data once.Here's how I would approach something like this with a sample rewrite below:
1) Pull the data once into a temp table
- get all required columns, add the one WHERE clause (on Default_list_no); maybe even don't add the WHERE clause as these can cause problems with linked servers.
- experiment with applying the WHERE clause (on Default_list_no) after getting the data
ie if most records are default_list_no = 00 then you might as well pull the whole table, then remove the few records
2) Clean it
- You see all that duplicate code for data cleaning, apply it once on data load. Or do it on your temp table once.
3) then remove the records you don't want
--!!TODO Add error handling
--!!TODO Split @ext into a temp table and join to that instead
CREATE PROCEDURE [dbo].[usp_TraceCallReport2]
@startdate as date,
@endDate as date,
@ext as nvarchar(50)
AS
BEGIN
SET NOCOUNT ON
------------------------------------------------------------------------------------------------
-- Setup START
-- Create temp tables required for the proc
------------------------------------------------------------------------------------------------
IF OBJECT_ID('#contacts') IS NOT NULL DROP TABLE #contacts
CREATE TABLE #contacts
(
AL_Contact_NameVARCHAR(50) NOT NULL,
CompanyNameVARCHAR(50) NOT NULL,
phoneVARCHAR(50) NOT NULL,
cleanPhoneNumberBIGINT NULL
)
-- Setup END
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
-- Get data START
------------------------------------------------------------------------------------------------
INSERT INTO #contacts ( AL_Contact_Name, companyName, phone )
SELECT DISTINCT
a.Contact_Name as AL_Contact_Name
, c.AR_COMPANY_NAME AS CompanyName --00 is the default list number
, CONTACT_PHONE AS phone
FROM ARLSQ01.dsg.dbo.DICE_ALCONCT AS a
LEFT JOIN ARLSQ01.dsg.dbo.DICE_ARSUBHD AS b ON a.ACCOUNT_NUMBER = b.ACCOUNT_NUMBER
LEFT JOIN ARLSQ01.dsg.dbo.DICE_ARCUSTMN AS c ON b.AR_NUMBER = c.AR_NUMBER
WHERE a.Default_list_no = '00'
INSERT INTO #contacts ( AL_Contact_Name, companyName, phone )
SELECT DISTINCT
d.FIRST_NAME + ' ' + d.LAST_NAME as ar_CONTACT_NAME
, e.AR_COMPANY_NAME --001 is the default list number
, REPLACE(REPLACE(REPLACE(REPLACE(d.PHONE_NUMBER, '(', ''), ')', ''), ' ', ''), '-', '') AS arPhone
FROM ARLSQ01.dsg.dbo.DICE_ARCONTCT AS d
INNER JOIN ARLSQ01.dsg.dbo.DICE_ARCUSTMN AS e ON d.AR_NUMBER = e.AR_NUMBER
WHERE d.list_number = '0001'
-- Get data END
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
-- Process data START
-- Clean and process the data
------------------------------------------------------------------------------------------------
-- Remove records you don't want now
DELETE #contacts WHERE phone LIKE '0%'
DELETE #contacts WHERE phone LIKE '1111%'
-- Now do the rank?
;WITH cte AS
(
select *, rnk = RANK() OVER(PARTITION BY companyname ORDER BY companyName)
from #contacts
)
DELETE cte
WHERE rnk > 1
-- Clean the number if you haven't done so already
UPDATE #contacts
SET cleanPhoneNumber = REPLACE(REPLACE(REPLACE(REPLACE(phone, '(', ''), ')', ''), ' ', ''), '-', '')
-- Process data END
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
-- Resultset START
------------------------------------------------------------------------------------------------
-- Resultset
SELECT DISTINCT cStaging.ID
,cStaging.cDate
,cStaging.cStartTime
,cStaging.cDuration
,cStaging.cTimetoAnswer
,cStaging.callingparty
,cStaging.cDigitsDialed
,cStaging.cOrigCall
,cStaging.Cdestination
,cStaging.cTransfer1
,cStaging.cTransfer2
,cStaging.cCustPhone
,cStaging.cDirection
,cStaging.calledparty
,cStaging.cSystemID
,cStaging.cANI
,cStaging.cDNIS
,cStaging.cCallID
,cStaging.cCallIDSeq
,(
CASE
WHEN aj1.CompanyName IS NULL
THEN aj2.CompanyName
ELSE aj1.CompanyName
END
) AS CompanyName, aj1.*,aj2.*
FROM CallTrace AS cStaging
LEFT JOIN #contacts AS aj1 ON LTRIM(RTRIM(aj1.cleanPhoneNumber)) = cStaging.Cdestination
LEFT JOIN #contacts AS aj2 ON LTRIM(RTRIM(aj2.cleanPhoneNumber)) = cStaging.callingparty
WHERE (CONVERT(DATE, cStaging.cDate) BETWEEN @startdate AND @endDate) AND (CONVERT(DATE, cStaging.cDate) <> '')
AND (cStaging.cOrigCall IN (SELECT NewParameter FROM dbo.fnSplit(@ext, ',') AS fnSplit_1))
-- Resultset END
------------------------------------------------------------------------------------------------
RETURN
GO
On my local test rig, for 10,000 rows, this reduced the performance from 2 minutes to 1 second.
I have tried to eliminate the remote call by inserting the contact information into a table on the same server the other data is on, I'll do a daily run of the insert to make sure I get any new accounts added, or old accounts reactivated.
Remove the NOT NULL constraints from the CREATE TABLE until the insert succeeds.
IF OBJECT_ID('#contacts') IS NOT NULL DROP TABLE #contacts
CREATE TABLE #contacts
(
AL_Contact_NameVARCHAR(50) NOT NULL,
CompanyNameVARCHAR(50) NOT NULL,
phoneVARCHAR(50) NOT NULL,
cleanPhoneNumberBIGINT NULL
)
July 7, 2014 at 9:15 am
I did that and it worked I got a lot of dups, so I did a MAX(cStaging.ID), and grouped that query of course, to avoid error, but then I got the error,
Column '#contacts.AL_Contact_Name' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. So I did a group by on the select of the #contact, but not sure where this error is occurring now.
July 7, 2014 at 10:19 am
As I can't see your data, it's hard to tell where the duplicates are coming in. Can you post an example of the duplicate data? Anonymise your actual customer names and numbers.
Have you altered the code much from what was provided? The logic of the rewrite was intended to be the same, however I may have missed something. For example have you already implemented the local contacts table you described earlier? Why do you think the duplicates are appearing?
Just out if interest, what was the performance like? It should be easier to remove the duplicates than it was to fix the performance.
Just from the type of questions you are now asking I would do some research in the following areas:
Understanding NULLs
Understanding Aggregates
Consider purchasing Itzik Ben Gan's T-SQL Fundamentals book.
July 7, 2014 at 12:15 pm
It ran in 31 seconds, WAY FASTER! Here's the data. I have not implemented the Master Contacts table.
July 7, 2014 at 12:40 pm
You can, and definitely should, avoid converting cStaging.cDate if you can.
--if "cDate" is a datetime column:
--instead of this:
--WHERE (CONVERT(DATE, cStaging.cDate) BETWEEN @startdate AND @endDate) AND (CONVERT(DATE, cStaging.cDate) <> '')
--use this:
WHERE cStaging.cDate >= @startdate AND cStaging.cDate < DATEADD(DAY, 1, @endDate)
--you don't need to check a date for ''
--if "cDate" is a varchar format that can be accurately directly compared to a
--datetime string, convert the variable to the matching format
--rather than the column itself; examples:
--if cDate is in format "120":
WHERE cStaging.cDate >= CONVERT(varchar(19), @start_date, 120) AND
cStaging.cDate < CONVERT(varchar(19), DATEADD(DAY, 1, @end_date), 120)
--if cDate is in format 101:
--<no valid character comparison format>
--if cDate is in format 102/111:
WHERE cStaging.cDate >= CONVERT(varchar(10), @start_date, 102) AND
cStaging.cDate < CONVERT(varchar(10), DATEADD(DAY, 1, @end_date), 102)
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
July 8, 2014 at 12:32 am
Hmm I still can't quite tell (I'm away from keyboard this week an only have phone!). The two LEFT JOINs might be an issue, remove them one by one to see where the duplicates come in. Otherwise add some debugging statements to the proc eg SELECT * FROM #contacts after it's populated, see if you can work it out.
Are you sure you didn't have duplicates originally? You could try restoring the original join logic
July 10, 2014 at 2:38 pm
I created the master contacts table to eliminate the remote call, It does have dups, only if the Company Name or Contact name is spelled differently. I am however still getting a lot more dups than is expected.
Here is my code I have modified your existing stored procedure.
USE [AA_Helper]
GO
/****** Object: StoredProcedure [dbo].[usp_TraceCallReport2] Script Date: 7/3/2014 10:48:01 AM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[usp_TraceCallReport2]
@startdate as date,
@endDate as date,
@ext as nvarchar(60)
AS
BEGIN
SET NOCOUNT ON
------------------------------------------------------------------------------------------------
-- Setup START
-- Create temp tables required for the proc
------------------------------------------------------------------------------------------------
IF OBJECT_ID('#contacts') IS NOT NULL DROP TABLE [#contacts]
CREATE TABLE [dbo].[#contacts](
[AR_COMPANY_NAME] [nvarchar](100) NOT NULL,
[LIST_NO] [nvarchar](max) NULL,
[CONTACT_NAME] [nvarchar](max) NULL,
[PHONE] [nvarchar](35) NOT NULL
)ON [PRIMARY]
IF OBJECT_ID('ContactMaster') IS NOT NULL DROP TABLE [dbo].[ContactMaster]
CREATE TABLE [dbo].[ContactMaster](
[AR_COMPANY_NAME] [nvarchar](100) NOT NULL,
[LIST_NO] [nvarchar](4) NULL,
[CONTACT_NAME] [nvarchar](100) NOT NULL,
[PHONE] [nvarchar](35) NOT NULL,
PRIMARY KEY CLUSTERED
(
[AR_COMPANY_NAME] ASC,
[PHONE] ASC,
[CONTACT_NAME] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
-- Setup END
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
-- Get data START
------------------------------------------------------------------------------------------------
INSERT INTO [#contacts] (AR_COMPANY_NAME, list_no ,CONTACT_NAME, phone )
SELECT DISTINCT
LTRIM(RTRIM(a.AR_COMPANY_NAME)) as COMPANY_NAME, c.default_list_no as list_no
, c.Contact_Name AS CONTACT_NAME --00 is the default list number
, case
when ltrim(rtrim(c.CONTACT_PHONE))='( ) -' then ''
when ltrim(rtrim(c.CONTACT_PHONE))='()' then ''
when ltrim(rtrim(c.CONTACT_PHONE)) like '([0-9] ) -' then ''
when ltrim(rtrim(c.CONTACT_PHONE)) like '([0-9][0-9] ) -' then ''
else c.CONTACT_PHONE
end as PHONE
FROM ARLSQ01.dsg.dbo.DICE_ARCUSTMN AS a
LEFT JOIN ARLSQ01.dsg.dbo.DICE_ARSUBHD AS b ON a.AR_NUMBER = b.AR_NUMBER
LEFT JOIN ARLSQ01.dsg.dbo.DICE_ALCONCT AS c ON b.ACCOUNT_NUMBER = c.ACCOUNT_NUMBER
WHERE c.CONTACT_PHONE is not NULL
INSERT INTO [#contacts](AR_COMPANY_NAME, list_no ,CONTACT_NAME, phone)
SELECT DISTINCT
LTRIM(RTRIM(d.AR_COMPANY_NAME)) as COMPANY_NAME , e.list_Number as list_no--001 is the default list number
, rtrim(ltrim(rtrim(e.FIRST_NAME) + ' ' + e.LAST_NAME)) as CONTACT_NAME
, case
when ltrim(rtrim(e.PHONE_NUMBER))='( ) -' then ''
when ltrim(rtrim(e.PHONE_NUMBER))='()' then ''
when ltrim(rtrim(e.PHONE_NUMBER)) like '([0-9] ) -' then ''
when ltrim(rtrim(e.PHONE_NUMBER)) like '([0-9][0-9] ) -' then ''
when ltrim(rtrim(e.PHONE_NUMBER)) like '1111%' then ''
when ltrim(rtrim(e.PHONE_NUMBER)) like '0%' then ''
else e.PHONE_NUMBER
end as PHONE
FROM ARLSQ01.dsg.dbo.DICE_ARCUSTMN AS d
INNER JOIN ARLSQ01.dsg.dbo.DICE_ARCONTCT AS e ON d.AR_NUMBER = e.AR_NUMBER
WHERE e.PHONE_NUMBER is not NULL
-- Get data END
------------------------------------------------------------------------------------------------
DELETE FROM #contacts where LIST_NO = 'XXXX'
;WITH cte AS
(
select AR_COMPANY_NAME,LIST_NO,contact_name,phone, rnk = RANK() OVER(PARTITION BY AR_COMPANY_NAME,phone ORDER BY list_no)
from #contacts
)
INSERT INTO [dbo].[ContactMaster]([AR_COMPANY_NAME],[LIST_NO],[CONTACT_NAME] ,[PHONE])
SELECT AR_COMPANY_NAME,LIST_NO,contact_name,phone FROM cte
WHERE rnk = 1
-- Clean the number if you haven't done so already
UPDATE #contacts
SET phone = REPLACE(REPLACE(REPLACE(REPLACE(phone, '(', ''), ')', ''), ' ', ''), '-', '') --remove clean portion of number above
-- Process data END
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
-- Resultset START
------------------------------------------------------------------------------------------------
-- Resultset
SELECT DISTINCT cStaging.ID
,cStaging.cDate
,cStaging.cStartTime
,cStaging.cDuration
,cStaging.cTimetoAnswer
,cStaging.callingparty
,cStaging.cDigitsDialed
,cStaging.cOrigCall
,cStaging.Cdestination
,cStaging.cTransfer1
,cStaging.cTransfer2
,cStaging.cCustPhone
,cStaging.cDirection
,cStaging.calledparty
,cStaging.cSystemID
,cStaging.cANI
,cStaging.cDNIS
,cStaging.cCallID
,cStaging.cCallIDSeq
,(
CASE
WHEN aj1.AR_COMPANY_NAME IS NULL
THEN aj2.CONTACT_NAME
ELSE aj1.AR_COMPANY_NAME
END
) AS CompanyName, aj1.*,aj2.*
FROM CallTrace AS cStaging
LEFT OUTER JOIN ContactMaster aj1 on REPLACE(REPLACE(REPLACE(REPLACE(aj1.PHONE, '(', ''), ')', ''), ' ', ''), '-', '') = cStaging.Cdestination
LEFT OUTER JOIN ContactMaster aj2 on REPLACE(REPLACE(REPLACE(REPLACE(aj2.PHONE, '(', ''), ')', ''), ' ', ''), '-', '') = cStaging.Cdestination
WHERE cStaging.cDate BETWEEN '2014-06-24' AND '2014-06-24'
AND (cStaging.cOrigCall IN (SELECT NewParameter FROM dbo.fnSplit('2005', ',') AS fnSplit_1))
-- Resultset END
------------------------------------------------------------------------------------------------
RETURN
END
GO
--select * FROM CallTrace
--execute [dbo].[usp_TraceCallReport2] @startdate = '6/24/2014', @enddate = '6/24/2014', @ext = '2005'
July 10, 2014 at 3:08 pm
There are a number of problems with that piece of code (and this is just what I can see via my phone):
- the date parameters are not used in the proc
- ContactMaster should probably be a temp table (#)
- you join twice to ContactMaster on the same colum, Cdestination <-- MAJOR PROBLEM
- you appear to have missed my point about duplicating code; clean the data once and once only. Try to avoid joins with functions.
- you appear to have got rid of the cleanPhone column. Why?
Make these corrections, see if you still get the duplicates.
Viewing 9 posts - 46 through 53 (of 53 total)
You must be logged in to reply to this topic. Login to reply