September 28, 2016 at 11:17 am
Hi I have a query that when I add in a first_value windows function goes from returning in a few seconds to last night I killed it after 7 hours. I'm trying to figure out what is going on. I have complete control over the tables so I can make any changes necessary. I always have great performance from Windows Functions generally just cannot figure out this issue. Query size is in the millions of rows when not using TOP 100.
I have the two execution plans attached. SELECT TOP 100
Dd.Databaseid
, Dp.Patientid
, Pp.Patientprofileid
, DATEPART(YEAR, Prx.Clinicaldate) AS Periodyear
, Periodmonth = TRY_CONVERT( CHAR(7), CAST(Prx.Clinicaldate AS DATETIME), 102)
, Eventdatekey = CAST(TRY_CONVERT(CHAR(8), Prx.Clinicaldate, 112) AS INT)
--, Intitalrx = First_Value(Prx.Clinicaldate) OVER(PARTITION BY Dp.Patientid , Vm.Name, DATEPART(YEAR, Prx.Clinicaldate) ORDER BY Prx.Clinicaldate ROWS UNBOUNDED PRECEDING)
, Vm.Name
, Vm.Genericname
, M.Instructions
, Quantity = IIF(LEFT(SUBSTRING(Prx.Quantity, PATINDEX('%[0-9]%', Prx.Quantity), 20), PATINDEX('%[^0-9]%', SUBSTRING(Prx.Quantity, PATINDEX('%[0-9]%', Prx.Quantity), 20)+'X')-1) > 1000, 1000, ISNULL(LEFT(SUBSTRING(Prx.Quantity, PATINDEX('%[0-9]%', Prx.Quantity), 20), PATINDEX('%[^0-9]%', SUBSTRING(Prx.Quantity, PATINDEX('%[0-9]%', Prx.Quantity), 20)+'X')-1), 0))
, Refills = IIF(LEFT(SUBSTRING(Prx.Refills, PATINDEX('%[0-9]%', Prx.Refills), 20), PATINDEX('%[^0-9]%', SUBSTRING(Prx.Refills, PATINDEX('%[0-9]%', Prx.Refills), 20)+'X')-1) > 12, 12, ISNULL(LEFT(SUBSTRING(Prx.Refills, PATINDEX('%[0-9]%', Prx.Refills), 20), PATINDEX('%[^0-9]%', SUBSTRING(Prx.Refills, PATINDEX('%[0-9]%', Prx.Refills), 20)+'X')-1), 0))
, Daysduration = 1
, NULL
--, Therapythreshold = DATEDIFF(DAY, First_Value(Prx.Clinicaldate) OVER(PARTITION BY Dp.Patientid , Vm.Name, DATEPART(YEAR, Prx.Clinicaldate) ORDER BY Prx.Clinicaldate ROWS UNBOUNDED PRECEDING), '12/31/'+TRY_CONVERT( CHAR(4), DATEPART(YEAR, Prx.Clinicaldate)))
, 0.0
, Rxstart = Prx.Clinicaldate
, Rxend = M.Stopdate
, Periodage = DATEDIFF(YEAR, Pp.Birthdate, '01/01/'+TRY_CONVERT( VARCHAR, DATEPART(YEAR, Prx.Clinicaldate)))-CASE WHEN DATEADD(Dd, DATEDIFF(Dd, 0, '12/31/'+TRY_CONVERT(VARCHAR, DATEPART(YEAR, Prx.Clinicaldate))), 0) < DATEADD(YEAR, DATEDIFF(YEAR, Pp.Birthdate, '12/31/'+TRY_CONVERT(VARCHAR, DATEPART(YEAR, Prx.Clinicaldate))), Pp.Birthdate) THEN 1 ELSE 0 END
, Eventage = (TRY_CONVERT( CHAR(10), CAST(Prx.Clinicaldate AS DATE), 112) - 0 - TRY_CONVERT(CHAR(10), Pp.Birthdate, 112)) / 10000
FROM
dbo.Medicate M
INNER JOIN dbo.Prescrib Prx ON Prx.Mid = M.Mid AND M.Databasekey = Prx.Databasekey AND M.Pid = Prx.Pid
INNER JOIN dbo.Medinfo Mf ON Mf.Ddid = M.Ddid AND Mf.Databasekey = M.Databasekey AND Mf.Gpi = M.Gpi
INNER JOIN VisanalyticsStaging.Clinical.Medication Vm ON Vm.Ndccode = Mf.Ndcnum
INNER JOIN dbo.Patientprofile Pp ON Pp.Pid = M.Pid AND Pp.Databasekey = M.Databasekey
INNER JOIN VisanalyticsHouse.Dimension.[Database] Dd ON Dd.Databasekey = M.Databasekey
INNER JOIN VisanalyticsHouse.Dimension.Patient Dp ON Dp.Databaseid = Dd.Databaseid AND Dp.Patientsourceid = Pp.Patientprofileid
WHERE Prx.Clinicaldate <= GETDATE()
***SQL born on date Spring 2013:-)
September 28, 2016 at 11:30 am
Hi Thomas,
from what I can see at the first glance is that the First_Value function does not have a POC (Partition, Order, Covering) index to work on and the cardinality goes way beyond the 10K rows which is the constraints for the Window Function's internal worktable before hitting the disk. If you can make certain that all the relevant data needed for the Window Function is in a single index, including the output columns then it should drastically improve the performance. One way of achieving this would be to use a temp table and create an index on that table or import the relevant columns into a single table and then add the index. In order to avoid a huge include on the index you can do the work in a CTE and then join it to the main query.
😎
September 28, 2016 at 11:34 am
Hi Erik,
Thanks for the suggestion going to try both a CTE and a temp table. I'll report back my results as soon as i'm done.
Thanks
***SQL born on date Spring 2013:-)
September 28, 2016 at 11:58 am
Eirikur beat me to it on the POC index - that's always the first thing I look for when a window function makes a query much slower.
I would add (and this is not a big of a problem) that I'm seeing RID lookups in your plans. That means that your nonclustered index does not include all of the columns needed for it to satisfy that part of the query. This, BTW, is the "C" part of the POC index concept. RID lookups can really slow things down.
-- Itzik Ben-Gan 2001
September 28, 2016 at 3:03 pm
Thanks gentlemen I'm still tuning this query but I got it from 7 hours to a couple minutes now which is acceptable. I started doing the temp table idea by Erik but as I kept checking the execution plans and focusing in on the scans and the RID lookups and Table Scans Allan mentioned I was able to get indexing in place that seemed to get 5.6 million rows processed in a decent time or at least for my needs. Attached is the updated execution plan.
One other thing I noticed was on one of my joins before on one key it was VARCHAR(50) and on the other side it was NVARCHAR(50) I made both VARCHAR(50) and no longer received a warning. I do however still have a warning on the my age calculations for Cardinality estimates and I'm wondering if it is really effecting my query ?
I'm on Amazon looking at Query Tuning books. I currently have Grants SQL Server Execution plans open on the desk and about 2/3 rds through it.
Thanks !
***SQL born on date Spring 2013:-)
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply