October 21, 2017 at 4:55 pm
Jeff Moden - Saturday, October 21, 2017 3:58 PMJason A. Long - Saturday, October 21, 2017 2:07 AMWell... I really am a big dumb animal... I forgot to unplug the table's RN column after running the times for the standalone table... And, we're back to previous numbers... :blush:What does that mean? What's the run time that we're actually talking about now?
I had run a few base line test with just the temp table on it's with the RN value being pushed into the variable...
When I plugged the functions back in, I overlooked the one the X2 version... So, while it was cross joined to the #TestData table it wasn't actually doing anything...
O course, I also wanted to build a another version that included Louis's weekend code to see if it would yield even better results... So when the the X2 that I had been "testing" 🙁 kept getting 154ms times and the Louis X3 version was almost 4 seconds... Well let's say I was confused.
Plans were identical... from shape to to estimated cost. The only thing I could see different, from the optimizes perspective was the size of the cached plan the X2 was 15 KB and the X3 was 500 KB.
Looked and looked... Even considered that maybe the engine didn't like the SIGN() functions... I finally got the bright idea to run them with SET SHOWPLAN_ALL ON; and see if I could spot something there.
Sure enough... The OutputList for X3, showed an expression and the OutputList for X2 showed td.RN...
After making the correction, they were both back to running in the 3.5 - 4 second range.
At this point I'm more disappointed with myself than I am any of the functions...
I knew yesterday that my weekend was going to be blown on a doing a search optimization project for work...
Basically, rather than going to bed like a normal human, I kept at it long after I'd lost focus and I let myself get rushed and sloppy.
October 21, 2017 at 6:52 pm
Jeff Moden - Saturday, October 21, 2017 3:56 PMJason A. Long - Friday, October 20, 2017 11:27 PMJeff Moden - Friday, October 20, 2017 7:47 PMWhile we're at it... when all this first started and while I was waiting for your Tally function, I cranked this out to generate test data. I normally resort to the Cross Join type of pseudo-cursor for forum work so that people don't have to worry about having a Tally function or table. It's what I've been using to test with.
--===== Test data control variables and presets
DECLARE @LoDT DATETIME = '2000' --Inclusive
,@HiDT DATETIME = '2030' --Exclusive
,@MaxSpan INT = 90 --Max number of days in period
,@Rows INT = 1000000
,@Days INT;
SELECT @Days = DATEDIFF(dd,@LoDT,@HiDT)-@MaxSpan
;
--===== Create the test table (added a column just to see)
IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL
DROP TABLE #TestData
;
CREATE TABLE #TestData
(
RN INT NOT NULL
,beg_dt DATETIME NOT NULL
,end_dt DATETIME NOT NULL
,Span AS CONVERT(FLOAT,end_dt-beg_dt) PERSISTED
PRIMARY KEY CLUSTERED (beg_dt, end_dt)
WITH (IGNORE_DUP_KEY = ON)
)
;
--===== Use "Minimal Logging" to populate the table with random "begin" dates
-- and a random span to create the random "end" dates.
WITH
cteGenDate AS
(
SELECT TOP (@Rows)
beg_dt = RAND(CHECKSUM(NEWID()))*@Days+@LoDT --Gotta love direct date math.
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
)
INSERT INTO #TestData WITH(TABLOCK) --For Minimal Logging
(RN, beg_dt, end_dt)
SELECT RN = ROW_NUMBER() OVER (ORDER BY beg_dt)
,beg_dt
,end_dt = RAND(CHECKSUM(NEWID()))*@MaxSpan+beg_dt
FROM cteGenDate
ORDER BY beg_dt, end_dt --For Minimal Logging with Clustered Index
OPTION (RECOMPILE) --For Minimal Logging
;Now this has some interesting stuff going on... You definitely took a more measured approach than I did.
I just picked a date close to middle on the Calendar table, added two "CHECKSUM(NEWID()) % 999999" and simply added or subtracted 9's until it produced legitimate looking ranges.
This... "beg_dt = RAND(CHECKSUM(NEWID()))*@Days+@LoDT " ... This took me a few minutes...I don't ever use RAND() and either didn't know or completely forgot that generates a float <= 1.
For a second I thought you were using voodoo to constrain the output range... I'll have to spend some time using it to see if I like it as much as the % method (thank you for that one too BTW)
Normally if I need "believable" test data I usually do something like this...SELECT TOP 1000
ROW_NUMBER() OVER (ORDER BY c.object_id),
DATEADD(DAY, ld.low_day, '20100101'),
DATEADD(DAY, hd.high_day, '20100101')
FROM
sys.columns c --< this is a lie... IRL it would be a tally...
CROSS APPLY ( VALUES (ABS(CHECKSUM(NEWID())) % 1460) ) ld (low_day) -- 0-4 years
CROSS APPLY ( VALUES (ABS(CHECKSUM(NEWID())) % (1825-ld.low_day) ) ) hd (high_day); -- low-day-5 yearsYou don't have to sell me on the direct date math or the INT math... there have been a few occasions that the INT math had me feeling like a 4 yr old at a magic show.
Sadly, for whatever reason, they aren't solutions that usually hit me right off the bat.While on the topic of cool tricks, In the other test harness... I noticed you have a CHECKPOINT...
It makes me suspect that you're grabbing test results from the transaction logs... If so, what's your verdict?
I've been kicking around the idea of using extended events to capture test results, hopefully with less "observer effect" than plan capture & SET STATISTICS IO,TIME ON;
https://docs.microsoft.com/en-us/sql/relational-databases/extended-events/quick-start-extended-events-in-sql-serverOnce again, thank you!
Thanks for the feedback, Jason. Always a pleasure to "talk" with you.
I don't know if you're ever seen the following two articles on the subject of the generation of random but constrained test data. Some of the things you've pointed out are covered in these articles.
Generating Test Data: Part 1 - Generating Random Integers and Floats
Generating Test Data: Part 2 - Generating Sequential and Random DatesFor the integer stuff in those two articles, I need to go back someday and make a correction because there is a 1 in 4 Billion chance (and I should have played the lottery on the days that I have hit that chance) that the largest negative number will appear and the ABSolute causes a failure on that. The fix is to do the Modulo before you do the ABS.
Dwain Camps (miss you alot, ol' friend) wrote some pretty good articles on the non-linear generation of random constrained data at the following link.
Generating Non-uniform Random Numbers with SQL
None of the graphics he posted in the article have survived because they were on another web site but it's a great article.
As for the CHECKPOINT thing... I actually left those in that particular harness by mistake. I wanted to make sure that having stuff in write-cache waiting to be written to disk wasn't the cause of something I was seeing (and can't remember what that was, now) in early testing.
As for Extended Events... that might be fine for personal testing just like using SQL Profiler or some third party tool might be fine but I generally try to make the test harnesses that I post on the forum 100% stand-alone so that no special skills or software are required on the part of the reader to duplicate the tests. I do make sure that the "observer effect" is minimal where I use SET STATISTICS or I'll use something else. The case of performance testing non-inline user defined functions is a good case where I won't use SET STATISTICS because of the extreme observer effect on those.
I There's a good chance that I've I came across them while looking for something else, but I don't recall reading either of them or the one of Dwain Camps.
All 3 are now added to my reading list. Awkward question... Did he pass on or simply find other things to do? I think the most recently dated article was sometime back in 2015.
He was actually my first introduction to the Quiky Update. I had a project that required taking 4 or 5 big history/journal tables, all tracking various different kinds of statuses for referrals. Shuffling them together on date was east enough but since each column was from a different table, it had more nulls, that actual values. It made a neat pattern actually...but I had no clue how I was going to bring the previous non-NULL down to fill the holes and then swap values as soon as it encountered the next non-NULL and not overwrite anything. His article demonstrating how to use Quirky method to smear data save my back side...
I think I stand on odd ground in the SQL world. I started as an SSRS developer who got hooked SQL. The job title of "DBA" got thrown on me out of the blue... The boss said, "We're happy with you and want to give you a raise but you're maxed out in your current role... So, now you're a DBA and and here's your pat bump". Typeing it took longer than the conversation... I've never actually considered myself an actual DBA... Since that time the company has changed hands and (and name) and the new boss is basically saying, "If you want it's yours. Get yourself up to speed..." So, I've been consuming as much information as I can as fast as I can... Most of which I never looked twice at when I was doing nothing but SQL code dev. As a developer I couldn't tell you underlying reason WHY you don't run Profile traces (at least not from the GUI) but I knew enough not to do it because it would kill performance...
Wow... I just realized that was the long winded way of saying that the move into new reading material led me to some really interesting articles and a few videos about DMV and extended events. The one thing that struck me about the extended events, was that, they supposedly don't have the impact that traces do. Whatever information I've gathered is swimming around with gobs of other new information, on a bunch of different topics, but the parts about being able to collect very detailed & accurate execution data, without impacting results, got the wheels turning. It's also why I got curious when I saw the CHECKPOINT...
I'm trying to force myself to get proficient with the basics... Reworking the how indexing is being done... understanding how backups & restores are supposed to work... being able to verify THAT they're working... that stuff... that said, there's that part of me that would rather be shot than become a professional back babysitter... that's the part that REALLY want to play with these things and see they can provide a better source of information. Of course that the same part that stays up until 4 am testing a function that wasn't even joined correctly... so there's that...
It's always a pleasure "talking" to you too.I learn something new every time and I really do appreciate it.
October 21, 2017 at 7:03 pm
Jason A. Long - Saturday, October 21, 2017 4:55 PMAt this point I'm more disappointed with myself than I am any of the functions...
I knew yesterday that my weekend was going to be blown on a doing a search optimization project for work...
Basically, rather than going to bed like a normal human, I kept at it long after I'd lost focus and I let myself get rushed and sloppy.
Heh... None of the rest of us has ever done such a thing before. 😉
You're a good man, Jason. Staying up late to help others even in the face of real work. And then, doing the ultimate and admitting that a mistake has been made. You can't ask for more than that. My hat's off to you, good Sir.
Sure enough... The OutputList for X3, showed an expression and the OutputList for X2 showed td.RN...
After making the correction, they were both back to running in the 3.5 - 4 second range.
On that note, I've made a function that I've used against the test data created by the generator that I posted just a bit back. I resurrected an old Calendar table that I made for such a task with a "Business Day Number" and it's clocking in at about 1.3 seconds for a million rows of test data over the 2000 thru 2029 period as you've been testing for.. I don't want to post it yet because I'm double checking some of the edge cases and because there's a piece of code that I've been playing with that has the potential of 0.3 seconds, but I've not worked out the bugs yet. Not sure I can.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 21, 2017 at 8:08 pm
Jeff Moden - Saturday, October 21, 2017 7:03 PMJason A. Long - Saturday, October 21, 2017 4:55 PMAt this point I'm more disappointed with myself than I am any of the functions...
I knew yesterday that my weekend was going to be blown on a doing a search optimization project for work...
Basically, rather than going to bed like a normal human, I kept at it long after I'd lost focus and I let myself get rushed and sloppy.Heh... None of the rest of us has ever done such a thing before. 😉
You're a good man, Jason. Staying up late to help others even in the face of real work. And then, doing the ultimate and admitting that a mistake has been made. You can't ask for more than that. My hat's off to you, good Sir.
Sure enough... The OutputList for X3, showed an expression and the OutputList for X2 showed td.RN...
After making the correction, they were both back to running in the 3.5 - 4 second range.On that note, I've made a function that I've used against the test data created by the generator that I posted just a bit back. I resurrected an old Calendar table that I made for such a task with a "Business Day Number" and it's clocking in at about 1.3 seconds for a million rows of test data over the 2000 thru 2029 period as you've been testing for.. I don't want to post it yet because I'm double checking some of the edge cases and because there's a piece of code that I've been playing with that has the potential of 0.3 seconds, but I've not worked out the bugs yet. Not sure I can.
Yea... I can laugh about it now... At 4:00 am... not so much. The thing is, I was holding off on posting until AFTER for that very reason... It always irks me when OP's abandon their their posts, w/o so much as a thank you, after someone took the time to answer their question. So I try to make a point not to do it myself... So when people started offering solutions I did a me... Since I can't pull the same bone head move again tonight... I will go ahead and grab anything you post as soon as I'm aware of it but it's unlikely i'll be able to do anything with it until Monday at the earliest.
You'll love this... Imagine this... a table with 177 columns, 8,267,5574 rows & 37 indexes... App developers asks if I can look at something with him, wants to know if I have any "go fast" for a search screen that billing uses... Gets called 1000+ times a day, and there are complaints that it way too slow...
Here's what they want go faster... Almost forgot... there are 15 just like, only minor variations...
That's right... a search... not the main data that employees work off of... Nope, it's goes out and comes back and says, "yea... I found so rows that appear to match your vague description. Would you like to have the retrieved?"
And that... is how I lost my weekend... 😀
October 21, 2017 at 8:28 pm
You need to use Gail's dynamic "Catch All Query" method on that one.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 21, 2017 at 9:17 pm
Jeff Moden - Saturday, October 21, 2017 8:28 PMYou need to use Gail's dynamic "Catch All Query" method on that one.
That is actually part of the plan... The search criteria is limited (no matter what other filters are chosen) by two equity predicates that limit the result to between 150 & 200K rows.
My plan is to move the search away from that table all together. I'm building a single "main table" that primarily IDs. from there I'll create condensed "satellite tables" that are distinct/normalized copies that will join back to the Main ID table.
Actually, I'd value you input on an idea that I'm waffling on with regard to names...
I had kicked around the idea of blowing out the first few letters of each name so that it could do speed up partial name searches... F_1 F_2 F_3 FirstName L_1 L_2 L_3 LastName
----------------------------------------------------------------------
J JO JOH JOHN S SM SMI SMITH
But now that I've convinced them that the users want pages to load fast and would gladly give up the ability to find "JASON" by searching on "%SON" to make that happen...
I don't think it's necessary... with leading wildcard requirement gone, the trailing wildcard is SARGEable...
I 99% sure I just answered my own question but I've also been looking at it for way too long... I'm open to persuasion...
To your original comment.. the plan is to use dynamic sql to piece together queries based on the filled parameters. Keep everything parameterized in order to get plan reuse. I'm fine with 20 or 30 plans floating out there... I just don't want a new plan every time a user clicks a button.
.
October 22, 2017 at 11:44 am
Here's my shot at this problem. Just to make life easy, I'll keep all the code together for this post. The code is attached as .txt files, which is a bit annoying... why wouldn't a forum for SQL Server allow someone to save .SQL files, I'll never know. :blink:
Build the Calendar Table
First, see the attached "Calender Rev 02 Small.txt" file. It's the old code I resurrected to make a Calendar table with some "special" columns along with some of the more traditional columns. It does not have any "holidays" associated with Easter and doesn't contain any ISO Week/Year. If you need any of that, post back. The "special" columns are...
1. WorkdayNumber - An ascending count of workdays in the calendar table. If a day is a weekend or holiday, the previous workday number is "smeared" down into the weekend or holiday. This allows for some incredibly simple code (simple subtraction between two integers) to determine the number of work days between two dates. It also allows for other easy and fast calculations such as determining lead times and what things like "in 5 business days actually means all using a simple lookup and integer addition or subtraction. This column is the reason why I wanted to resurrect this skinny and quite old table. It's magic. This is the same kind of column that Joe Celko spoke of in his post except the code I've included actually does such a thing and, using the proprietary 3 part UPDATE (also known as a "Quirky Update"), is nasty fast and even runs in SQL Server 2000 (haven't tested it in 6.5 or 7) because it doesn't use RBAR and it doesn't use Lead/Lag/Preceeding, etc.
2. DTSerial - The PK of this table is the DT column, which actually stands for "DateTime" and not just "Date". DTSerial is the underlying integer value of the number of whole days that have passed since the 1st of January, 1900. It doesn't have a huge impact for performance but, as you'll see, it can make code simpler and so I recently added it for this problem.
3. DWOccurance - Although not used for the problem on this thread, this column is very useful for greatly simplifying questions/queries like "What is the date of the 3rd Monday of June" or "Create a list of the dates for every Tuesday of the month for the next two years". And, yes, "DW" stands for "Day of the Week" and is the same notation as what is offered in the SQL Server date/time functions. Please get used to that or change it to "Weekday" if you can't memorize the simple and obvious 2 character names for the common date parts. In any case, quitchurbitchin about it because I'll never use the long names for the date parts. "Keep your eye upon the donut and not upon the hole". 😉
Those columns and all the others are described in the header of the code that makes the table. I didn't take the time to add those descriptions to each column as extended properties but will in the future. This might make a good article.
To use the code, open it in SSMS, look for "TODO" to find where you need to change the range of dates to be included and make any changes you need (preset for the requirements Jason posted earlier). Then , after doing your safety review of the code, run the code.
IF you ever need to Move, Add, or Delete (MAD) a holiday, do your thing to the calendar table making sure to set the IsHoliday column in the table and adding the description. This can be done using a "bulk" bit of code to do the whole table (as in the original code for a given holiday) or just a one-off for a given year (I have to do this a lot of years at work) and then run the last two "paragraphs" of this code to renumber the WorkdayNumber column according to the changes in holidays.
The code that builds the Calendar table also includes ONLY those indexes necessary to solve this problem
Build the GetBusinessDays Function
Open the attached "GetBusinessDays Function Rev 00 .txt" in SSMS. After doing a safety check, execute the code in the database where you will use it. Rather than explaining it in detail here, please see the documentation embedded in the function. To summarize its action, it simply looks up the value from the WorkdayNumber column for the start and end date and takes a difference between the two to return the number of business days. Make sure that you read the "Performance" section in the header of that function, which talks about the non-troublesome but insane number of reads it causes and for an alternative.
Build the Test Data
I didn't use Jason's rendition of the test data table although I've used the same columns. I've also built it as a "real" table instead of a "Temp Table" just to simulate real life. I previously posted what I used but have attached it here because of the "real life" change so that you don't have to use the "seek and ye shall find" method to find it and then make the modification. Load the "Test Data Generator.txt" file into SSMS and, after your safety check, execute it to create the dbo.TestData table in whatever database you're using for testing.
Testing the Function
Here's my simple test harness just for testing this function. I've lost track of all the other functions as to whether they work or not, etc, so feel free to add whatever other functions you wish to test in a similar manner. Don't forget that the name of the table is now a real table called dbo.TestData. A part of my testing includes the possibly real life requirement of having to return the values for beg_dt and end_dt. As with all the other testing, dummy variable are used to remove parasitic delays caused by displaying the data or writing to disk. We just want to know what the function will do.
Here's the code that tests the function that uses the Calendar table I made.
DECLARE @beg_dt DATETIME
,@end_dt DATETIME
,@BusinessDays INT
;
PRINT '========== Function uses Calendar Table =================================================='
SET STATISTICS TIME,IO ON;
SELECT @beg_dt = td.beg_dt
,@end_dt = td.end_dt
,@BusinessDays = bd.BusinessDays
FROM dbo.TestData td
CROSS APPLY dbo.GetBusinessDays(td.beg_dt,td.end_dt) bd;
SET STATISTICS TIME,IO OFF
;
Here are the results from that test. If you read the "Performance" section of the header for the function, you'll understand why the number of reads are so insane. They don't really hurt here but they'll drive your DBA or any "Monitoring Software" absolutely nuts.
========== Function uses Calendar Table ==================================================
Table 'Calendar'. Scan count 0, logical reads 4000000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TestData'. Scan count 5, logical reads 3645, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 3961 ms, elapsed time = 1131 ms.
With that in mind, I'll suggest that the function should be relegated only to single row use. If you have to resolve Business Days for a whole table, the following approach (which is very similar to the code in the function) runs in about half the time and uses 3 orders of magnitude fewer reads because there is no implied CROSS JOIN despite appearances.
DECLARE @beg_dt DATETIME
,@end_dt DATETIME
,@BusinessDays INT
;
PRINT '========== Direct Code uses Calendar Table ==============================================='
SET STATISTICS TIME,IO ON;
SELECT @beg_dt = td.beg_dt
,@end_dt = td.end_dt
,@BusinessDays =
(--==== Get the WorkDayNumber for the 2nd date, which should be >= than the fir1stst
SELECT WorkDayNumber
FROM dbo.Calendar
WHERE DTSerial = DATEDIFF(dd,0,td.end_dt)
)
- --Yes... this is simple subtraction of two correlated subqueries in the SELECT list.
(--==== Get the WorkDayNumber for the 1st date, which should be <= than the 2nd
SELECT WorkDayNumber
FROM dbo.Calendar
WHERE DTSerial = DATEDIFF(dd,0,td.beg_dt)
)
FROM dbo.TestData td;
SET STATISTICS TIME,IO OFF
;
Here are the nasty fast and much more efficient resource usage stats from that code.
========== Direct Code uses Calendar Table ===============================================
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TestData'. Scan count 1, logical reads 3599, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Calendar'. Scan count 2, logical reads 64, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 608 ms, elapsed time = 608 ms.
Notice that not only did we decrease the number of reads by 3 orders of magnitude, but we also cut CPU usage down to 1/6th of what was consumed by the function and the code didn't need to "Go Parallel" to do any of it.
For those interested, this testing was done on an old VAIO E-Series laptop running Windows 7 and SQL Server 2008. 4GB of ram is allocated to SQL Server and the box has a dual core i5 processor threaded to 4.
Plus, I'm really curious to see what Jeff comes up with.
Right or wrong, you just had to know that it was going to take me awhile to 'splain it all. 😉 My goal was to make it so that you didn't actually need to modify code to make a change for holidays. Just update the Calendar table (read the notes there for how to MAD holidays... it really is simple), which also has some other uses concerning Business Days.
Last but most certainly not least, thank you Jason for posting this question and thank everyone one of you that participated. There was some damned good input and thought processes going on by every one. Lordy, I love this community!!!
And with that in mind, I'm not the fastest gun in town and I look forward to some of the other ideas in this thread to be developed and tested.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 22, 2017 at 2:17 pm
Jeff Moden - Sunday, October 22, 2017 11:44 AMHere's my shot at this problem. Just to make life easy, I'll keep all the code together for this post. The code is attached as .txt files, which is a bit annoying... why wouldn't a forum for SQL Server allow someone to save .SQL files, I'll never know. :blink:
Build the Calendar Table
First, see the attached "Calender Rev 02 Small.txt" file. It's the old code I resurrected to make a Calendar table with some "special" columns along with some of the more traditional columns. It does not have any "holidays" associated with Easter and doesn't contain any ISO Week/Year. If you need any of that, post back. The "special" columns are...1. WorkdayNumber - An ascending count of workdays in the calendar table. If a day is a weekend or holiday, the previous workday number is "smeared" down into the weekend or holiday. This allows for some incredibly simple code (simple subtraction between two integers) to determine the number of work days between two dates. It also allows for other easy and fast calculations such as determining lead times and what things like "in 5 business days actually means all using a simple lookup and integer addition or subtraction. This column is the reason why I wanted to resurrect this skinny and quite old table. It's magic. This is the same kind of column that Joe Celko spoke of in his post except the code I've included actually does such a thing and, using the proprietary 3 part UPDATE (also known as a "Quirky Update"), is nasty fast and even runs in SQL Server 2000 (haven't tested it in 6.5 or 7) because it doesn't use RBAR and it doesn't use Lead/Lag/Preceeding, etc.
2. DTSerial - The PK of this table is the DT column, which actually stands for "DateTime" and not just "Date". DTSerial is the underlying integer value of the number of whole days that have passed since the 1st of January, 1900. It doesn't have a huge impact for performance but, as you'll see, it can make code simpler and so I recently added it for this problem.
3. DWOccurance - Although not used for the problem on this thread, this column is very useful for greatly simplifying questions/queries like "What is the date of the 3rd Monday of June" or "Create a list of the dates for every Tuesday of the month for the next two years". And, yes, "DW" stands for "Day of the Week" and is the same notation as what is offered in the SQL Server date/time functions. Please get used to that or change it to "Weekday" if you can't memorize the simple and obvious 2 character names for the common date parts. In any case, quitchurbitchin about it because I'll never use the long names for the date parts. "Keep your eye upon the donut and not upon the hole". 😉
Those columns and all the others are described in the header of the code that makes the table. I didn't take the time to add those descriptions to each column as extended properties but will in the future. This might make a good article.
To use the code, open it in SSMS, look for "TODO" to find where you need to change the range of dates to be included and make any changes you need (preset for the requirements Jason posted earlier). Then , after doing your safety review of the code, run the code.
IF you ever need to Move, Add, or Delete (MAD) a holiday, do your thing to the calendar table making sure to set the IsHoliday column in the table and adding the description. This can be done using a "bulk" bit of code to do the whole table (as in the original code for a given holiday) or just a one-off for a given year (I have to do this a lot of years at work) and then run the last two "paragraphs" of this code to renumber the WorkdayNumber column according to the changes in holidays.
The code that builds the Calendar table also includes ONLY those indexes necessary to solve this problem
Build the GetBusinessDays Function
Open the attached "GetBusinessDays Function Rev 00 .txt" in SSMS. After doing a safety check, execute the code in the database where you will use it. Rather than explaining it in detail here, please see the documentation embedded in the function. To summarize its action, it simply looks up the value from the WorkdayNumber column for the start and end date and takes a difference between the two to return the number of business days. Make sure that you read the "Performance" section in the header of that function, which talks about the non-troublesome but insane number of reads it causes and for an alternative.Build the Test Data
I didn't use Jason's rendition of the test data table although I've used the same columns. I've also built it as a "real" table instead of a "Temp Table" just to simulate real life. I previously posted what I used but have attached it here because of the "real life" change so that you don't have to use the "seek and ye shall find" method to find it and then make the modification. Load the "Test Data Generator.txt" file into SSMS and, after your safety check, execute it to create the dbo.TestData table in whatever database you're using for testing.Testing the Function
Here's my simple test harness just for testing this function. I've lost track of all the other functions as to whether they work or not, etc, so feel free to add whatever other functions you wish to test in a similar manner. Don't forget that the name of the table is now a real table called dbo.TestData. A part of my testing includes the possibly real life requirement of having to return the values for beg_dt and end_dt. As with all the other testing, dummy variable are used to remove parasitic delays caused by displaying the data or writing to disk. We just want to know what the function will do.Here's the code that tests the function that uses the Calendar table I made.
DECLARE @beg_dt DATETIME
,@end_dt DATETIME
,@BusinessDays INT
;
PRINT '========== Function uses Calendar Table =================================================='
SET STATISTICS TIME,IO ON;
SELECT @beg_dt = td.beg_dt
,@end_dt = td.end_dt
,@BusinessDays = bd.BusinessDays
FROM dbo.TestData td
CROSS APPLY dbo.GetBusinessDays(td.beg_dt,td.end_dt) bd;
SET STATISTICS TIME,IO OFF
;Here are the results from that test. If you read the "Performance" section of the header for the function, you'll understand why the number of reads are so insane. They don't really hurt here but they'll drive your DBA or any "Monitoring Software" absolutely nuts.
========== Function uses Calendar Table ==================================================
Table 'Calendar'. Scan count 0, logical reads 4000000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TestData'. Scan count 5, logical reads 3645, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.SQL Server Execution Times:
CPU time = 3961 ms, elapsed time = 1131 ms.With that in mind, I'll suggest that the function should be relegated only to single row use. If you have to resolve Business Days for a whole table, the following approach (which is very similar to the code in the function) runs in about half the time and uses 3 orders of magnitude fewer reads because there is no implied CROSS JOIN despite appearances.
DECLARE @beg_dt DATETIME
,@end_dt DATETIME
,@BusinessDays INT
;
PRINT '========== Direct Code uses Calendar Table ==============================================='
SET STATISTICS TIME,IO ON;
SELECT @beg_dt = td.beg_dt
,@end_dt = td.end_dt
,@BusinessDays =
(--==== Get the WorkDayNumber for the 2nd date, which should be >= than the fir1stst
SELECT WorkDayNumber
FROM dbo.Calendar
WHERE DTSerial = DATEDIFF(dd,0,td.end_dt)
)
- --Yes... this is simple subtraction of two correlated subqueries in the SELECT list.
(--==== Get the WorkDayNumber for the 1st date, which should be <= than the 2nd
SELECT WorkDayNumber
FROM dbo.Calendar
WHERE DTSerial = DATEDIFF(dd,0,td.beg_dt)
)
FROM dbo.TestData td;
SET STATISTICS TIME,IO OFF
;Here are the nasty fast and much more efficient resource usage stats from that code.
========== Direct Code uses Calendar Table ===============================================
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TestData'. Scan count 1, logical reads 3599, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Calendar'. Scan count 2, logical reads 64, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.SQL Server Execution Times:
CPU time = 608 ms, elapsed time = 608 ms.Notice that not only did we decrease the number of reads by 3 orders of magnitude, but we also cut CPU usage down to 1/6th of what was consumed by the function and the code didn't need to "Go Parallel" to do any of it.
For those interested, this testing was done on an old VAIO E-Series laptop running Windows 7 and SQL Server 2008. 4GB of ram is allocated to SQL Server and the box has a dual core i5 processor threaded to 4.
Plus, I'm really curious to see what Jeff comes up with.
Right or wrong, you just had to know that it was going to take me awhile to 'splain it all. 😉 My goal was to make it so that you didn't actually need to modify code to make a change for holidays. Just update the Calendar table (read the notes there for how to MAD holidays... it really is simple), which also has some other uses concerning Business Days.
Last but most certainly not least, thank you Jason for posting this question and thank everyone one of you that participated. There was some damned good input and thought processes going on by every one. Lordy, I love this community!!!
And with that in mind, I'm not the fastest gun in town and I look forward to some of the other ideas in this thread to be developed and tested.
Ahhhhhh you're killing me!!! I was really hoping you'd hold off until tomorrow, just so I wouldn't have to battle the temptation.
But I have to... I'm still stomping my way through that "gazillion was to search" thing... My checklist still has at least three more tables, ETL(ish) load procs , indexes, DRI and last, but certainly not least... the D-SQL to bring it all together.
But, make no mistake... Getting into whatever it is that you posted is triggering full on ADD.
As far as, "I'm not the fastest gun in town"... You may get one upped by someone using your code as a base and tweaking the knobs... I'll simply say this, when I hit a a wall I can't around or over on my own, I look to see if you, Itzik Ben-Gan, Adam Machanic or Peter Larrson (SwePeso) have published solutions to similar problems.
I'm not saying, that anyone else gets ignored... but if one of you four put it out, there a very high likelihood that it's going that it fast, efficient and orders of magnitude better than the blog post above or below in the search results. I'll also add this... Both you and Itzik have the ability to explain what you did and why you did it in manner that even I can usually get my head around (no easy feat)... So... Be modest if you want. I'm still dying to drop everything a dive in to whatever it is you posted... And no... I'm not even going to shim through it... because I know me and I know what will happen. I also have a feeling the races are about to begin so I have to duck out least I get sucked in.
As always, thank you! I can't say, I owe you one" because it's in the dozens by now... So, I can ever return the favor, please don't hesitate to let me know.
October 22, 2017 at 4:07 pm
jcelko212 32090 - Friday, October 20, 2017 10:28 AMEaster alone requires too much math
{EDIT} Apologies for the lame formatting. The forum software crushed me again.
CREATE FUNCTION dbo.GetEasterDate
/**********************************************************************************************************************
Purpose:
Given a 4 digit year, returns the date for Easter Sunday of that year.
CREATE FUNCTION dbo.EasterRelatedDatesForYear
/**********************************************************************************************************************
Purpose:
Given a date, calculate when Easter Sunday will occur for the year of that date. This function also calculates the
"main" Easter related dates. It does not calculate "Lent", which has interpretations for when it ends depending on
the particular faith.
-----------------------------------------------------------------------------------------------------------------------
Credits/References:
This function is based on a 19 year cycle, which I first picked up on from Peter "PESO" Larsson at the following URL:
http://weblogs.sqlteam.com/peterl/archive/2010/09/08/fast-easter-day-function.aspx
See? 1950's technology isn't so bad. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
October 22, 2017 at 4:45 pm
And I love 1950s technology! I still balance my checkbook with an abacus (Chinese style, with Japanese style beads). When I cannot use a slide ruler anymore – maybe just the C and D scales for multiplication. I actually thought about proposing a book that would used McBee cards for teaching database in elementary and middle school. I will need to find an agent for this. And I still wear cuff links and a pocket protector 🙂
Please post DDL and follow ANSI/ISO standards when asking for help.
October 22, 2017 at 6:06 pm
jcelko212 32090 - Friday, October 20, 2017 10:28 AM
And I love 1950s technology! I still balance my checkbook with an abacus (Chinese style, with Japanese style beads). When I cannot use a slide ruler anymore – maybe just the C and D scales for multiplication. I actually thought about proposing a book that would used McBee cards for teaching database in elementary and middle school. I will need to find an agent for this. And I still wear cuff links and a pocket protector 🙂
Heh... too funny. I actually wrote a sort algorithm based on the old IBM card sorter. It provided incredible stability between multiple sorts and blew the doors off the Shell Sort method a colleague of mine wrote. And, if you're still using a classic slip-stick, you're way behind. You need to upgrade to a circular slide rule. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
October 22, 2017 at 7:11 pm
Jeff and Joe... If you haven't watched this before, it's well worth the time it takes to watch.
Joe, You'e name is and your work on relational division gets some very nice (and well deserved) recognition.
Enjoy! 🙂
October 23, 2017 at 8:28 am
Jason A. Long - Sunday, October 22, 2017 2:17 PMAhhhhhh you're killing me!!! I was really hoping you'd hold off until tomorrow, just so I wouldn't have to battle the temptation.
But I have to... I'm still stomping my way through that "gazillion was to search" thing... My checklist still has at least three more tables, ETL(ish) load procs , indexes, DRI and last, but certainly not least... the D-SQL to bring it all together.
But, make no mistake... Getting into whatever it is that you posted is triggering full on ADD.
BWAAAA-HAAAA!!!! Man, do I ever have an appreciation for that! I found the tweek I was looking for and couldn't let it go. I was "compelled" to post before the weekend was up so that I could stop thinking about it. I had a work project due for Monday (today) but couldn't even think about it because of the ADD coming from this very interesting problem. I finally got to my work project at midnight and finished it up at about 3:30 this morning.
As always, thank you! I can't say, I owe you one" because it's in the dozens by now... So, I can ever return the favor, please don't hesitate to let me know.
As "Red-Green" says, "We're all in this together... and I'm pullin' for ya". 😉 You come up with some really interesting ideas and some great code and I learn more than you might guess from your posts whether is just by observing or by lending a hand. That's payment enough for me. Keep up the great work.
And thank you very much for the kind and thoughtful words. You made my year. That's no easy task because this year has been a real PITA. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
October 24, 2017 at 1:28 am
Jeff Moden - Monday, October 23, 2017 8:28 AMJason A. Long - Sunday, October 22, 2017 2:17 PMAhhhhhh you're killing me!!! I was really hoping you'd hold off until tomorrow, just so I wouldn't have to battle the temptation.
But I have to... I'm still stomping my way through that "gazillion was to search" thing... My checklist still has at least three more tables, ETL(ish) load procs , indexes, DRI and last, but certainly not least... the D-SQL to bring it all together.
But, make no mistake... Getting into whatever it is that you posted is triggering full on ADD.BWAAAA-HAAAA!!!! Man, do I ever have an appreciation for that! I found the tweek I was looking for and couldn't let it go. I was "compelled" to post before the weekend was up so that I could stop thinking about it. I had a work project due for Monday (today) but couldn't even think about it because of the ADD coming from this very interesting problem. I finally got to my work project at midnight and finished it up at about 3:30 this morning.
As always, thank you! I can't say, I owe you one" because it's in the dozens by now... So, I can ever return the favor, please don't hesitate to let me know.
As "Red-Green" says, "We're all in this together... and I'm pullin' for ya". 😉 You come up with some really interesting ideas and some great code and I learn more than you might guess from your posts whether is just by observing or by lending a hand. That's payment enough for me. Keep up the great work.
And thank you very much for the kind and thoughtful words. You made my year. That's no easy task because this year has been a real PITA. 🙂
BWAAAA-HAAAA!!!! Man, do I ever have an appreciation for that! I found the tweek I was looking for and couldn't let it go. I was "compelled" to post before the weekend was up so that I could stop thinking about it. I had a work project due for Monday (today) but couldn't even think about it because of the ADD coming from this very interesting problem. I finally got to my work project at midnight and finished it up at about 3:30 this morning.
I have to say that I'm glad to know I'm not the only one who does that. I always regret the following day... but it seems like it's the only time when I can work without the daytime interruptions. I'm glad to know that I'm at least keeping good company.
You're luck I didn't know you up... I spent two hours trying to figure out a mathematical way to compare 2 two digit INT to the 1st two digits of a BIGINT that could be from 6 to 19 digits... I know it's an odd thing to do, but apparently I'm the only one who's ever attempted it... That or my Googles weren't working... I eventually found solution to "how many digits?" question that use the LOG10() function and figured I could know 2 off that and... well the math got easier after that.
Here's what ended playing around with... DECLARE @n INT = 3;
SELECT
*
FROM
( VALUES (23, 45, 32, 88, CAST(2345668837896416825 AS BIGINT)) ) n (a, b, c, d, x)
CROSS APPLY ( VALUES (n.x / (10 * POWER(CAST(10 AS BIGINT), CAST(LOG10(n.x) AS BIGINT) - (@n * 2)))) ) bv (BaseValue)
CROSS APPLY ( VALUES (
n.a * POWER(CAST(10 AS BIGINT), (@n - 1) * 2),
n.b * POWER(CAST(10 AS BIGINT), (@n - 2) * 2),
n.c * POWER(CAST(10 AS BIGINT), (@n - 3) * 2),
n.d * POWER(CAST(10 AS BIGINT), (@n - 4) * 2)
) ) ex (a, b, c, d);
On that note, hopefully your work project has a happier ending than mine... I was told at 5:30 this evening that a production solution had been promised by Wednesday... and that they'd just try putting yet another "band-aid over the bullet hole"... This may actually be the first time in my life that actually regret having a work ethic.
As "Red-Green" says, "We're all in this together... and I'm pullin' for ya". You come up with some really interesting ideas and some great code and I learn more than you might guess from your posts whether is just by observing or by lending a hand. That's payment enough for me. Keep up the great work.
I'm flattered that you'd say that, especially so after today... because earlier, I was wishing I'd chosen a career making big ball bearing into little ball bearings. That or putting the tips on shoelaces.
The up side, I can now, with a clear conscience, dive head first into the working days function and really check out the code you posted.
And thank you very much for the kind and thoughtful words. You made my year. That's no easy task because this year has been a real PITA. 🙂
Well, I'm glad they were well received. I'm usually good about providing analytical feedback and acknowledging outstanding work, but I've felt awkward letting people know that they actually mean something. It always sounds sycophantic in my head...
I also know that it's hell of a lot easier helping people when you know they genuinely appreciate what you've done for them. Considering that I can look back over the last 4 or 5 years and factor in A) what I've learned and B) code that was pulled from this forum and put nearly verbatim into production solutions... This forum in general, you in particular, is directly responsible for a substantial portion of my paycheck. Don't doubt for a second that I'm genuinely grateful...
I don't know what's been going on to make this a less than spectacular year, and I'm not one to pry but you're more than welcome to send a PM... and I'll leave it at that.
Tomorrow... later today, at point... Will be a good day... I have a new function to beat up. I just need to make sure it's actually in the harness this time... :Whistling:
October 24, 2017 at 3:35 am
Hi,
How about this approach:
DECLARE @BegDate DATETIME = '20000701',
@EndDate DATETIME = '20000707';
Declare @tHolidays TABLE (Holiday DATETIME PRIMARY KEY);
INSERT INTO @tHolidays (Holiday)
SELECT Holiday FROM (VALUES (CAST('20000529' AS DATETIME)),
('20000704'), ('20000904'), ('20001123'), ('20001124'), ('20001225'), ('20010101'), ('20010528'), ('20010704'), ('20010903'), ('20011122'),
('20011123'), ('20011225'), ('20020101'), ('20020527'), ('20020704'), ('20020902'), ('20021128'), ('20021129'), ('20021225'), ('20030101'),
('20030526'), ('20030704'), ('20030901'), ('20031127'), ('20031128'), ('20031225'), ('20040101'), ('20040531'), ('20040705'), ('20040906'),
('20041125'), ('20041126'), ('20041224'), ('20041231'), ('20050530'), ('20050704'), ('20050905'), ('20051124'), ('20051125'), ('20051226'),
('20060102'), ('20060529'), ('20060704'), ('20060904'), ('20061123'), ('20061124'), ('20061225'), ('20070101'), ('20070528'), ('20070704'),
('20070903'), ('20071122'), ('20071123'), ('20071225'), ('20080101'), ('20080526'), ('20080704'), ('20080901'), ('20081127'), ('20081128'),
('20081225'), ('20090101'), ('20090525'), ('20090703'), ('20090907'), ('20091126'), ('20091127'), ('20091225'), ('20100101'), ('20100531'),
('20100705'), ('20100906'), ('20101125'), ('20101126'), ('20101224'), ('20101231'), ('20110530'), ('20110704'), ('20110905'), ('20111124'),
('20111125'), ('20111226'), ('20120102'), ('20120528'), ('20120704'), ('20120903'), ('20121122'), ('20121123'), ('20121225'), ('20130101'),
('20130527'), ('20130704'), ('20130902'), ('20131128'), ('20131129'), ('20131225'), ('20140101'), ('20140526'), ('20140704'), ('20140901'),
('20141127'), ('20141128'), ('20141225'), ('20150101'), ('20150525'), ('20150703'), ('20150907'), ('20151126'), ('20151127'), ('20151225'),
('20160101'), ('20160530'), ('20160704'), ('20160905'), ('20161124'), ('20161125'), ('20161226'), ('20170102'), ('20170529'), ('20170704'),
('20170904'), ('20171123'), ('20171124'), ('20171225'), ('20180101'), ('20180528'), ('20180704'), ('20180903'), ('20181122'), ('20181123'),
('20181225'), ('20190101'), ('20190527'), ('20190704'), ('20190902'), ('20191128'), ('20191129'), ('20191225'), ('20200101'), ('20200525'),
('20200703'), ('20200907'), ('20201126'), ('20201127'), ('20201225'), ('20210101'), ('20210531'), ('20210705'), ('20210906'), ('20211125'),
('20211126'), ('20211224'), ('20211231'), ('20220530'), ('20220704'), ('20220905'), ('20221124'), ('20221125'), ('20221226'), ('20230102'),
('20230529'), ('20230704'), ('20230904'), ('20231123'), ('20231124'), ('20231225'), ('20240101'), ('20240527'), ('20240704'), ('20240902'),
('20241128'), ('20241129'), ('20241225'), ('20250101'), ('20250526'), ('20250704'), ('20250901'), ('20251127'), ('20251128'), ('20251225'),
('20260101'), ('20260525'), ('20260703'), ('20260907'), ('20261126'), ('20261127'), ('20261225'), ('20270101'), ('20270531'), ('20270705'),
('20270906'), ('20271125'), ('20271126'), ('20271224'), ('20271231'), ('20280529'), ('20280704'), ('20280904'), ('20281123'), ('20281124'),
('20281225'), ('20290101'), ('20290528'), ('20290704'), ('20290903'), ('20291122'), ('20291123'), ('20291225')
) h (Holiday);
SELECT
(DATEDIFF(dd, @BegDate, @EndDate) + 1)
-(DATEDIFF(wk, @BegDate, @EndDate) * 2)
-(CASE WHEN DATENAME(dw, @BegDate) = 'Sunday' THEN 1 ELSE 0 END)
-(CASE WHEN DATENAME(dw, @EndDate) = 'Saturday' THEN 1 ELSE 0 END)
- (SELECT SUM(CASE WHEN DATENAME(dw, Holiday) = 'Saturday' Or DATENAME(dw, Holiday) = 'Sunday' THEN 0 ELSE 1 END)
FROM @tHolidays WHERE Holiday >= @BegDate AND Holiday <= @EndDate);
Lars
Viewing 15 posts - 46 through 60 (of 92 total)
You must be logged in to reply to this topic. Login to reply