Recently I was asked to alter a report to pull different more meaningful data. The particular report was pulling data from the datawarehouse. It had been recently modified to use a stored procedure (and subsequently improved performance 10 fold). Due to this change, the report started showing differently. The report was now paginating on the different matrices. Due to this minor display change, the creative wheels started turning and new requirements started developing from the real needs of the users of the report.
Through the quick review process, the requirements evolved from first displaying the current week and previous week data to being able to display data grouped by pay period. If possible, then the goal would be to correlate the data to specific pay periods. If the process was going to be too complex, or take too long – then sorting by weeks would be adequate. Correlating the data to current and previous weeks would be rather simple. The data was already present. The code was also conducive to making this correlation. So as a milestone, this task was completed first – as quickly as possible. Doing this also made it possible to more quickly jump onto the more puzzling requirement – which was more of a desirable, yet unnecessary requirement requested by the business.
Based on this, I came up with some enhanced requirements that would better define this request.
- The payperiods need to be accessible to the query
- No Physical Lookup Table
- Do not hard-code the data in the code.
When reviewing these requirements, I was puzzled as to how to accomplish the task. I did not want physical structures that required continued maintenance and pay period updates. I would do that if it was absolutely necessary. I was hoping to achieve something that required little-to-no maintenance, was fast, accurate, and provided the end-user the desired results.
Thinking about it for a bit, I came across a few ideas but each evaporated when I found a flaw with it. It soon dawned on me a quick way to do it. I could use a recursive CTE, one known good pay period start date, and then some date logic. Date logic by itself did not seem useful enough for me since the pay periods were for specific ranges.
Thus I came up with the following CTE:
With periodstarts (StartID, StartDate,EndDate)As ( Select 1 As StartID,@StartDate,dateadd(wk, datediff(wk, 0, @StartDate) + 2, 0) -1 Union All Select StartID + 1,DATEADD(wk, DATEDIFF(wk, 0, StartDate) + 2, 0) As StartDate,dateadd(wk, datediff(wk, 0, EndDate) + 2, 0)-1 as EndDate From PeriodStarts Where DATEADD(wk, DATEDIFF(wk, 0, StartDate) + 2, 0) > StartDate And Startid < 105 )
This CTE gives me the ability to create 4 years worth of pay periods on the fly. There are likely other ways of accomplishing the same task, this one suits me very well. To build the table, I start with a known valid pay period start date. From there, I can create the start and end dates of each of the pay periods over the next four years. To calculate the dates for the start and end for each period beyond the initial seed date, I used a method shown by Lynn Pettis in one of his Blog posts. In my scenario, each pay period is two weeks. To calculate the ending day of the pay period, I just subtract 1 day from the Result of adding two weeks to the start date. Verifying the data, I can see that I have start and end dates that correlate correctly to the pay periods.
The next step was to integrate the above into the query, and thus be able to correctly assign data to either the previous pay period or the current pay period. I was able to accomplish this through two left joins. This was the tricky part. I initially only created the CTE to have Start Dates and no end dates. This proved to be more difficult than I desired. The Table Joins started getting a little too complex and convoluted for what I had envisioned. I decided it would be much simpler to also include the EndDate in the CTE, thus drastically improving readability and ease of design for the query. That tricky part was now overcome to a degree, and I was able to associate some of the records. However, I was getting stumped on the Previous PayPeriod records. After trying a few things, I realized how easy the fix was to retrieve those records. A simple change to use Isnull in the Second Left join resolved this issue.
So now, my Join code is something like this:
INNER JOIN clc.dbo.employees e
ON e.ntloginname = MIS.NTLoginName
AND e.lc_flag = 1
AND e.activeflag = 1
AND e.AdminRights = 0
LEFT OUTER JOIN PeriodStarts PS
ON MIS.RecordDate BETWEEN PS.StartDate AND PS.EndDate
AND @CurrentDate BETWEEN PS.StartDate AND PS.EndDate
LEFT OUTER JOIN PeriodStarts Prev
ON MIS.RecordDate BETWEEN Prev.StartDate AND Prev.EndDate
AND IsNull(PS.StartDate,0) = 0
Since the only tying factor between my data is a recorddate and the payperiod range, I needed to be able to compare the recorddate to the startdate and enddate range. This works better than I had expected. As was expected, I would incur some cost to create the “PayDay” table on the fly as in the CTE. I also take a hit for the date comparisons, since I can only compare on a Range and not do an actual equality. The query is executing across two databases (1 is SQL 2005 and the other is SQL 2ooo) and returns in about 200ms, without any index tuning.
If I tune indexes in one of the tables (93% of total cost to the query comes from this table), I expect to see some improvement. Since the table only has a clustered Index, I started by creating an Index on NTLoginName, RecordDate, Product, LVCRequested and ProspectID. I know, I didn’t divulge the entire query, so some of this is coming out of the blue. However, those fields were in the SQL 2000 database and were required outputs for this query. By adding a new NC Index, I was able to reduce the Clustered Index Scan to an Index Seek. For this part of the query, it reduced overall cost from 93% to 33%. Logical reads on the table reduced from about 5000 to 76 – another substantial savings. Total execution time is down to about 140ms.
All in all, this is a good solution for the requirements at hand. Revisiting the self-defined requirements:
- The payperiods need to be accessible to the query
- No Physical Lookup Table
- Do not hard-code the data in the code.
The only requirement that may be questionable is #3. I do need to pass a date into the proc to make this whole thing work. That date must be a known good payperiod start date. However, I have also set a default so that the date will populate to one that I know is good. #2 is a achieved since I did not create a permanent physical lookup table. The nice takeaway from this exercise has been the improvement in the query once again. Though the query is doing something a bit more difficult than previously, performance is better. It was also a nice exercise in thinking outside the box.
Here is the full execution plan of the revised query.