June 29, 2008 at 1:33 pm
We have a workflow management app, which tracks the status of insurance applications, and records the step each case currently is at. Most of the case data is stored in the audit_trail table below. I need to collect a snapshot of the data (which constantly changes) at hourly intervals. A new record is added for each step the case goes through. Besides the 'audit_date' field which stores the timestamp it enters the stepname, the 'at_id' field indicates the most recent step for each case, since the number is incremental. We are only interested in certain stepnames, which can be seen in the query. In addition, I need to find the Case's StartDate, so we can calculate how old the Case is. The Query I have returns accurate information but is terribly slow. Hopefully, some of our experts here can show me a better technique to make this query fly. I tried to follow Jeff Moden's forum etiquette, but if i missed any important details, just let me know. Thanks!
--===== If the test table already exists, drop it
IF OBJECT_ID('TempDB..#audit_trail','U') IS NOT NULL
DROP TABLE #audit_trail
--===== Create the test table with
CREATE TABLE #audit_trail
(
at_id numeric(20, 0) NOT NULL,
node_id int NOT NULL,
proc_id int NOT NULL,
casenum numeric(10, 0) NOT NULL,
type_id int NOT NULL,
audit_date datetime NOT NULL,
stepdesc varchar(24) NULL,
user_name varchar(255) NULL,
stepname varchar(8) NULL,
CONSTRAINT pk_audit_trail PRIMARY KEY CLUSTERED
(
at_id ASC,
node_id ASC
)
)
--===== Create Non-Clustered Index
CREATE NONCLUSTERED INDEX [i_audit_trail] ON #audit_trail
(
casenum ASC,
proc_id ASC
)
--===== Insert the test data into the test table
INSERT INTO #audit_trail
(at_id,node_id,proc_id,casenum,type_id,audit_date,stepdesc,user_name,stepname)
SELECT '28600708','1','10','399716','0','Jun 23 2008 2:34PM','','swadmin@STAFFVIEW','' UNION ALL
SELECT '28600709','1','10','399716','1','Jun 23 2008 2:34PM','*ignore','swadmin@STAFFVIEW','START' UNION ALL
SELECT '28600710','1','10','399716','2','Jun 23 2008 2:34PM','*ignore','swadmin@STAFFVIEW','START' UNION ALL
SELECT '28600711','1','10','399716','1','Jun 23 2008 2:34PM','Data Review Step','REVIEW@STAFFVIEW','REVIEW' UNION ALL
SELECT '28650700','1','10','399716','2','Jun 27 2008 12:35PM','Data Review Step','simmons@STAFFVIEW','REVIEW' UNION ALL
SELECT '28650701','1','10','399716','1','Jun 27 2008 12:35PM','DATA ENTRY STEP','DATAENTRY@STAFFVIEW','DATAENTY' UNION ALL
SELECT '28664580','1','10','399716','2','Jun 30 2008 11:10AM','DATA ENTRY STEP','melzer@STAFFVIEW','DATAENTY' UNION ALL
SELECT '28664581','1','10','399716','1','Jun 30 2008 11:10AM','Send Back Queue','SENDBACK@STAFFVIEW','SENDBACK' UNION ALL
SELECT '28631292','1','10','400455','0','Jun 25 2008 3:27PM','','swadmin@STAFFVIEW','' UNION ALL
SELECT '28631293','1','10','400455','1','Jun 25 2008 3:27PM','*ignore','swadmin@STAFFVIEW','START' UNION ALL
SELECT '28631294','1','10','400455','2','Jun 25 2008 3:27PM','*ignore','swadmin@STAFFVIEW','START' UNION ALL
SELECT '28631295','1','10','400455','1','Jun 25 2008 3:27PM','Data Review Step','REVIEW@STAFFVIEW','REVIEW' UNION ALL
SELECT '28682496','1','10','400455','2','Jul 1 2008 11:08AM','Data Review Step','simmons@STAFFVIEW','REVIEW' UNION ALL
SELECT '28682497','1','10','400455','1','Jul 1 2008 11:09AM','DATA ENTRY STEP','DATAENTRY@STAFFVIEW','DATAENTY'
SELECT * FROM #audit_trail
--===== If the test table already exists, drop it
IF OBJECT_ID('TempDB..#audit_trail','U') IS NOT NULL
DROP TABLE #audit_trail
Here is the Query I am using. It returns accurate results, but very long to do so in the 12 million record DB.
SELECT getdate() as RefreshDate
,SUM(CASE WHEN stepname = 'CALLBACK' THEN 1 ELSE 0 END) AS CALLBACK
,SUM(CASE WHEN stepname = 'DATAENTY' THEN 1 ELSE 0 END) AS DATAENTY
,SUM(CASE WHEN stepname = 'DATAREV' THEN 1 ELSE 0 END) AS DATAREV
,SUM(CASE WHEN stepname = 'PARTIAL' THEN 1 ELSE 0 END) AS PARTIAL
,SUM(CASE WHEN stepname = 'REVIEW' THEN 1 ELSE 0 END) AS REVIEW
,SUM(CASE WHEN stepname = 'SENDBACK' THEN 1 ELSE 0 END) AS SENDBACK
,SUM(CASE WHEN stepname = 'SUPREV' THEN 1 ELSE 0 END) AS SUPREV
,SUM(CASE WHEN stepname = 'WBREVIEW' THEN 1 ELSE 0 END) AS WBREVIEW
,SUM(Over7Days) AS Over7Days
,SUM(Over10Days) AS Over10Days
,MIN(StartDate) AS OldestDate
FROM
(
SELECTa.casenum,
b.StartDate,
a.audit_date,
case when datediff(d,b.StartDate,a.audit_date) >= 7 then 1 end as Over7Days,
case when datediff(d,b.StartDate,a.audit_date) >= 10 then 1 end as Over10Days,
a.stepname
FROM
(
SELECTa.casenum,
a.audit_date,
a.stepname
FROM#audit_trail a
WHEREa.at_id =
(
SELECT MAX(at_id)
FROM#audit_trail b
WHEREa.casenum = b.casenum
)
AND a.type_id = 1
AND a.stepname IN ('CALLBACK','DATAENTY','DATAREV','PARTIAL','REVIEW','SENDBACK','SUPREV','WBREVIEW')
) a
INNER JOIN
(
SELECTcasenum
,audit_date as StartDate
FROM#audit_trail
WHEREstepname = 'START'
AND type_id = 1
) b
ON a.casenum = b.casenum
) A
June 29, 2008 at 11:58 pm
Create indexes on
CaseNUm
at_ID
If it is still slow, add anothert 2 index on TypeID
and StepName
June 30, 2008 at 5:42 am
It's a 3rd party vendor app, unfortunately, so I can't modify it.
July 1, 2008 at 4:09 am
Pop
Your sample data doesn't return any rows. This chunk...
SELECT a.casenum,
a.audit_date,
a.stepname
FROM #audit_trail a
WHERE a.at_id =
(
SELECT MAX(at_id)
FROM #audit_trail b
WHERE a.casenum = b.casenum
)
AND a.type_id = 1
AND a.stepname IN ('CALLBACK','DATAENTY','DATAREV','PARTIAL','REVIEW','SENDBACK','SUPREV','WBREVIEW')
... comes back empty. Any chance of rejigging the data please?
I reckon there's scope to halve the run time by playing with the query.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 1, 2008 at 7:32 am
I take it the task names you have given (and are reporting) are a subset of all of the available ones.
Hence the IN clause in the query.
It might be better to have another temporary table and have an inner join rather than that IN.
That way you will get better index hits rather than scans.
July 1, 2008 at 9:00 am
Thanks Chris, I changed the sample data to return results. (can't figure out how to get that one 'wink' out of the code though.)
John, you are correct, I am only interested in a subset of the possible steps. Haven't tried an additional temp table yet.
Thanks guys.
July 1, 2008 at 9:16 am
Cool, many thanks Pop.
Got a question for you:
SELECT a.casenum,
a.audit_date,
a.stepname
FROM #audit_trail a
WHERE a.at_id =
(
SELECT MAX(at_id)
FROM #audit_trail b
WHERE a.casenum = b.casenum
)
AND a.type_id = 1
AND a.stepname IN ('CALLBACK','DATAENTY','DATAREV','PARTIAL','REVIEW','SENDBACK','SUPREV','WBREVIEW')
Are you sure about this bit:
SELECT MAX(at_id)
FROM #audit_trail b
WHERE a.casenum = b.casenum
Because it translates as "pick up MAX(at_id) for each casenum. Rows which don't match our where clause are then discarded". This means that for a casenum which has matches on the where clause, unless the most recent row is a matching row, it will be omitted.
Are you sure you don't mean "pick up MAX(at_id) for each casenum where rows match our where clause"?
Cheers
ChrisM
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 1, 2008 at 10:37 am
for each casenum, the 'at_id' field increments. So the record with the highest 'at_id' value for each casenum will contain the most recent step.
What I wanted to do was find out what is the most recent step for each case when I run the Query... and only return those that match my Search Condition (PARTIAL,CALLBACK,etc.), since those are the only steps I'm interested in measuring.
Hope this makes sense. Feel free to correct me, if I'm wrong.:)
July 1, 2008 at 10:41 am
So you want the cases having their most recent step in your list of stepnames? Sorry to be a pain!
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 1, 2008 at 10:45 am
yep, you're exactly right.
it's measuring how many cases are in each of those listed steps, at a point in time when I run the query.
July 1, 2008 at 10:57 am
Hi Pop
SELECT a.casenum, a.audit_date, c.audit_date as StartDate, a.stepname
FROM #audit_trail a
INNER JOIN (SELECT MAX(at_id) AS at_id, casenum FROM #audit_trail GROUP BY casenum) b
ON b.casenum = a.casenum and b.at_id = a.at_id
INNER JOIN #audit_trail c
ON c.casenum = a.casenum AND c.type_id = 1 AND c.stepname = 'START'
WHERE a.type_id = 1
AND a.stepname IN ('CALLBACK','DATAENTY','DATAREV','PARTIAL','REVIEW','SENDBACK','SUPREV','WBREVIEW')
This is doing things a little differently, worth a try.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 1, 2008 at 2:07 pm
Interesting, def a little shorter code-wise....no performance increase though...
July 2, 2008 at 9:05 am
pop (7/1/2008)
Interesting, def a little shorter code-wise....no performance increase though...
How many rows are returned by this...
SELECT a.casenum, a.at_id
FROM #audit_trail a
INNER JOIN (SELECT MAX(at_id) AS at_id, casenum FROM #audit_trail GROUP BY casenum) b
ON b.casenum = a.casenum AND b.at_id = a.at_id
WHERE a.type_id = 1
AND a.stepname IN ('START','CALLBACK','DATAENTY','DATAREV','PARTIAL','REVIEW','SENDBACK','SUPREV','WBREVIEW')
...and how long does it take to run?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply