June 5, 2015 at 12:08 pm
This is completely branching off the topic of the original article, but... I've grown accustomed to instantiating a Tally table whenever possible in any database I work with regularly, instead of the dynamic Nums created in a WITH clause as you show here. Any thoughts on the pros and cons of doing so?
To start with, instantiating the table lets you add columns to easily translate the number to words, if you have some need for that. (e.g. (1, 'one, 'first'), (2, 'two, 'second')) An obvious downside is that you occasionally hit situations where you can't easily instantiate a table (like a finicky third party application database). Has anyone done performance comparisons, before I run off and duplicate their work?
June 5, 2015 at 12:19 pm
Brian J. Parker (6/5/2015)
This is completely branching off the topic of the original article, but... I've grown accustomed to instantiating a Tally table whenever possible in any database I work with regularly, instead of the dynamic Nums created in a WITH clause as you show here. Any thoughts on the pros and cons of doing so?To start with, instantiating the table lets you add columns to easily translate the number to words, if you have some need for that. (e.g. (1, 'one, 'first'), (2, 'two, 'second')) An obvious downside is that you occasionally hit situations where you can't easily instantiate a table (like a finicky third party application database). Has anyone done performance comparisons, before I run off and duplicate their work?
The "pros" are that it's usually a bit faster than the cascading CTE version for values less than 11K (haven't tested above that). The "cons" are that it does get cached (lives in memory, part of what makes it so fast) and it generates reads... sometimes, a shedload of reads where the cascading CTE does not. The disadvantage there is that a lot of poorly performing code also generates a whole lot of reads (for a different reason, of course) and is one of the primary symptoms that people look for to find such code.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 5, 2015 at 12:26 pm
RonKyle (6/5/2015)
Why do you need the hour? Is that necessary to populate the date when it's just the date? I'm intrigued by this method, but not sure it addresses just adding a date only. (I wouldn't ever mix my date and time dimensions for lots of reasons). Unfornately I can't test this until I get home, but am trying to understand it. I agree that first the most part the remaining fields can be populated using date time functions. There are some exceptions of course, such as if the day is a holiday.
Why would the day being a holiday be a problem?
--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
June 5, 2015 at 12:27 pm
Why would the day being a holiday be a problem?
It's not a problem, it's just that there's no date function for that and unlike almost all the others has to be populated in a different way.
June 5, 2015 at 12:39 pm
RonKyle (6/5/2015)
Why would the day being a holiday be a problem?
It's not a problem, it's just that there's no date function for that and unlike almost all the others has to be populated in a different way.
You might be able to handle that in a case statement or a Where clause. Still no reason to go to a cursor.
--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
June 5, 2015 at 1:00 pm
Still no reason to go to a cursor.
I only use(d) a cursor to initially populate the dates. The susquent updates, including the holidays (each holiday date individually typed out), are set based queries.
June 5, 2015 at 1:03 pm
RonKyle (6/5/2015)
Still no reason to go to a cursor.
I only use(d) a cursor to initially populate the dates. The susquent updates, including the holidays (each holiday date individually typed out), are set based queries.
That could be a fun exercise. Make the whole thing one set based query using a tally table.
--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
June 5, 2015 at 1:28 pm
Brian J. Parker (6/5/2015)
This is completely branching off the topic of the original article, but... I've grown accustomed to instantiating a Tally table whenever possible in any database I work with regularly, instead of the dynamic Nums created in a WITH clause as you show here. Any thoughts on the pros and cons of doing so?To start with, instantiating the table lets you add columns to easily translate the number to words, if you have some need for that. (e.g. (1, 'one, 'first'), (2, 'two, 'second')) An obvious downside is that you occasionally hit situations where you can't easily instantiate a table (like a finicky third party application database). Has anyone done performance comparisons, before I run off and duplicate their work?
There are many ways to create "virtual" tally tables. One of the downsides of virtual tally tables is how they clutter up the query plan by doing cross join after cross join. I've taken to using a base function that simply produces 100 rows from a values clause. In the execution plan, this is represented by a single constant scan rather than a series of cross joins.
In our code, 100 is generally enough to handle most strings that need to be parsed, dates to be generated into the future, etc. This avoids the I/O from a true physical table object and still gives you a fairly clean query plan to look at.
See the attached code, and the sample execution in comments at the bottom.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
June 5, 2015 at 1:33 pm
RonKyle (6/5/2015)
Why do you need the hour? Is that necessary to populate the date when it's just the date? I'm intrigued by this method, but not sure it addresses just adding a date only. (I wouldn't ever mix my date and time dimensions for lots of reasons). Unfornately I can't test this until I get home, but am trying to understand it. I agree that first the most part the remaining fields can be populated using date time functions.
In most cases you don't need by the hour. The point was about comparing techniques generating ~1,000,000 rows.
There are some exceptions of course, such as if the day is a holiday.
That's a different but interesting topic. For that I have a separate Holiday table.
-- Itzik Ben-Gan 2001
June 5, 2015 at 2:05 pm
That's a different but interesting topic. For that I have a separate Holiday table.
I agree that that's a better way. It's been on my list to do for some time.
June 5, 2015 at 2:27 pm
Jeff Moden (6/5/2015)
Brian J. Parker (6/5/2015)
This is completely branching off the topic of the original article, but... I've grown accustomed to instantiating a Tally table whenever possible in any database I work with regularly, instead of the dynamic Nums created in a WITH clause as you show here. Any thoughts on the pros and cons of doing so?To start with, instantiating the table lets you add columns to easily translate the number to words, if you have some need for that. (e.g. (1, 'one, 'first'), (2, 'two, 'second')) An obvious downside is that you occasionally hit situations where you can't easily instantiate a table (like a finicky third party application database). Has anyone done performance comparisons, before I run off and duplicate their work?
The "pros" are that it's usually a bit faster than the cascading CTE version for values less than 11K (haven't tested above that). The "cons" are that it does get cached (lives in memory, part of what makes it so fast) and it generates reads... sometimes, a shedload of reads where the cascading CTE does not. The disadvantage there is that a lot of poorly performing code also generates a whole lot of reads (for a different reason, of course) and is one of the primary symptoms that people look for to find such code.
Another benefit of using a permanent tally table over a CTE tally that Jeff did not mention is a much prettier Query Plan. Depending the complexity of your code and how often you call it, the query plan with a CTE can start to look like a late night spaghetti dinner at an Irish pub.
I have tested this quite a bit (just a little self study exercise) up to about 50M rows.
Based on my testing, if you just test how fast one counts from one (or 0) to N a permanent tally table with a clustered index is notably faster than a CTE Tally. It gets slightly faster and generates slightly fewer reads if you add a unique non-clustered index on N.
In the real world, however, when I'm not just counting from one or zero to N I have found that the CTE tally has been substantially faster. So much so that I don't use a permanent tally table anymore.
Interesting too, Itzik Ben-Gan says, in his 2012 Windows Functions book (pg. 133) that it's preferable to have a permanent numbers table but does not explain why. It's on my list of questions to ask him next time I go to SQL PASS.
-- Itzik Ben-Gan 2001
June 5, 2015 at 2:29 pm
RonKyle (6/5/2015)
That's a different but interesting topic. For that I have a separate Holiday table.
I agree that that's a better way. It's been on my list to do for some time.
If you've gotten as far as a table of dates, here's some starter logic I use:
[Holiday] AS
CASE
WHEN MONTH(date)=1 AND DAY(date)=1 THEN 'New Years Day'
WHEN MONTH(date)=5 AND DatePart(weekday, DateAdd(day,@@DATEFIRST,[date]))=2 AND DAY(date) BETWEEN 25 AND 31 THEN 'Memorial Day'
WHEN MONTH(date)=7 AND DAY(date)=4 THEN 'Independence Day'
WHEN MONTH(date)=9 AND DatePart(weekday, DateAdd(day,@@DATEFIRST,[date]))=2 AND DAY(date) BETWEEN 1 AND 7 THEN 'Labor Day'
WHEN MONTH(date)=11 AND DatePart(weekday, DateAdd(day,@@DATEFIRST,[date]))=5 AND DAY(date) BETWEEN 22 and 28 THEN 'Thanksgiving'
WHEN MONTH(date)=11 AND DatePart(weekday, DateAdd(day,@@DATEFIRST,[date]))=6 AND DAY(date) BETWEEN 22 and 28 THEN 'Day after Thanksgiving'
WHEN MONTH(date)=12 AND DAY(date)=25 THEN 'Christmas Eve'
WHEN MONTH(date)=12 AND DAY(date)=26 THEN 'Christmas Day'
ELSE ''
END /* CASE */,
You're only in some trouble if you need to do Easter... every other U.S. holiday I can think of follows one of two simple patterns to figure out when it falls.
If you find yourself doing calculations that count business days and the like, I find it helpful to add this:
[DayOfWeek] AS DatePart(weekday, DateAdd(day,@@DATEFIRST,[date])),
[DayOfWeek_desc] AS (
CASE DatePart(weekday, DateAdd(day,@@DATEFIRST,[date]))
WHEN 1 THEN 'SUN'
WHEN 2 THEN 'MON'
WHEN 3 THEN 'TUE'
WHEN 4 THEN 'WED'
WHEN 5 THEN 'THU'
WHEN 6 THEN 'FRI'
WHEN 7 THEN 'SAT'
END /*CASE*/
),
[IsWeekend] AS CAST((CASE WHEN DatePart(weekday, DateAdd(day,@@DATEFIRST,[date])) in (1,7) THEN 1 ELSE 0 END) AS bit),
...which also gives you a way of marking what day-of-the week a date is that doesn't change if @@DATEFIRST does, and lets you generate date labels; and then this (assuming IsBusinessDay is a bit defaulting to 1):
UPDATE c
SET IsBusinessDay = 0
FROM dbo.Calendar c
WHERE Holiday != ''
OR ( DayOfWeek=6 AND EXISTS (select * from dbo.Calendar c2 where c2.date=dateadd(day,1,c.date) and c2.Holiday!='') ) -- Saturday holidays recognized on preceeding Friday
OR ( DayOfWeek=2 AND EXISTS (select * from dbo.Calendar c2 where c2.date=dateadd(day,-1,c.date) and c2.Holiday!='') ) -- Sunday holidays recognized on following Monday
There is probably a better way to do these in one swoop (?) but it's a one-time script that lets you do a lot of things more easily.
June 5, 2015 at 2:44 pm
Brian J. Parker (6/5/2015)
RonKyle (6/5/2015)
That's a different but interesting topic. For that I have a separate Holiday table.
I agree that that's a better way. It's been on my list to do for some time.
If you've gotten as far as a table of dates, here's some starter logic I use:
<code snippit>
You're only in some trouble if you need to do Easter... every other U.S. holiday I can think of follows one of two simple patterns to figure out when it falls.
This is where the Holiday table comes in... it really varies from company to company. One of the few things I miss about working a big bank is how many days are considered holidays. If you're not from Illinois you may not know who Casimir Pulaski was but yes in the Illinois, as far as banks are concerned, Casimir Pulaski is a holiday. :hehe:
-- Itzik Ben-Gan 2001
June 5, 2015 at 3:41 pm
Brian J. Parker (6/5/2015)
RonKyle (6/5/2015)
That's a different but interesting topic. For that I have a separate Holiday table.
I agree that that's a better way. It's been on my list to do for some time.
If you've gotten as far as a table of dates, here's some starter logic I use:
[Holiday] AS
CASE
WHEN MONTH(date)=1 AND DAY(date)=1 THEN 'New Years Day'
WHEN MONTH(date)=5 AND DatePart(weekday, DateAdd(day,@@DATEFIRST,[date]))=2 AND DAY(date) BETWEEN 25 AND 31 THEN 'Memorial Day'
WHEN MONTH(date)=7 AND DAY(date)=4 THEN 'Independence Day'
WHEN MONTH(date)=9 AND DatePart(weekday, DateAdd(day,@@DATEFIRST,[date]))=2 AND DAY(date) BETWEEN 1 AND 7 THEN 'Labor Day'
WHEN MONTH(date)=11 AND DatePart(weekday, DateAdd(day,@@DATEFIRST,[date]))=5 AND DAY(date) BETWEEN 22 and 28 THEN 'Thanksgiving'
WHEN MONTH(date)=11 AND DatePart(weekday, DateAdd(day,@@DATEFIRST,[date]))=6 AND DAY(date) BETWEEN 22 and 28 THEN 'Day after Thanksgiving'
WHEN MONTH(date)=12 AND DAY(date)=25 THEN 'Christmas Eve'
WHEN MONTH(date)=12 AND DAY(date)=26 THEN 'Christmas Day'
ELSE ''
END /* CASE */,
You're only in some trouble if you need to do Easter... every other U.S. holiday I can think of follows one of two simple patterns to figure out when it falls.
If you find yourself doing calculations that count business days and the like, I find it helpful to add this:
[DayOfWeek] AS DatePart(weekday, DateAdd(day,@@DATEFIRST,[date])),
[DayOfWeek_desc] AS (
CASE DatePart(weekday, DateAdd(day,@@DATEFIRST,[date]))
WHEN 1 THEN 'SUN'
WHEN 2 THEN 'MON'
WHEN 3 THEN 'TUE'
WHEN 4 THEN 'WED'
WHEN 5 THEN 'THU'
WHEN 6 THEN 'FRI'
WHEN 7 THEN 'SAT'
END /*CASE*/
),
[IsWeekend] AS CAST((CASE WHEN DatePart(weekday, DateAdd(day,@@DATEFIRST,[date])) in (1,7) THEN 1 ELSE 0 END) AS bit),
...which also gives you a way of marking what day-of-the week a date is that doesn't change if @@DATEFIRST does, and lets you generate date labels; and then this (assuming IsBusinessDay is a bit defaulting to 1):
UPDATE c
SET IsBusinessDay = 0
FROM dbo.Calendar c
WHERE Holiday != ''
OR ( DayOfWeek=6 AND EXISTS (select * from dbo.Calendar c2 where c2.date=dateadd(day,1,c.date) and c2.Holiday!='') ) -- Saturday holidays recognized on preceeding Friday
OR ( DayOfWeek=2 AND EXISTS (select * from dbo.Calendar c2 where c2.date=dateadd(day,-1,c.date) and c2.Holiday!='') ) -- Sunday holidays recognized on following Monday
There is probably a better way to do these in one swoop (?) but it's a one-time script that lets you do a lot of things more easily.
Nah, Easter is easy. It is the first Sunday after the first full moon after the Vernal Equinox.
--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
June 5, 2015 at 4:12 pm
Sioban Krzywicki (6/5/2015)
Brian J. Parker (6/5/2015)
RonKyle (6/5/2015)
That's a different but interesting topic. For that I have a separate Holiday table.
I agree that that's a better way. It's been on my list to do for some time.
If you've gotten as far as a table of dates, here's some starter logic I use:
[Holiday] AS
CASE
WHEN MONTH(date)=1 AND DAY(date)=1 THEN 'New Years Day'
WHEN MONTH(date)=5 AND DatePart(weekday, DateAdd(day,@@DATEFIRST,[date]))=2 AND DAY(date) BETWEEN 25 AND 31 THEN 'Memorial Day'
WHEN MONTH(date)=7 AND DAY(date)=4 THEN 'Independence Day'
WHEN MONTH(date)=9 AND DatePart(weekday, DateAdd(day,@@DATEFIRST,[date]))=2 AND DAY(date) BETWEEN 1 AND 7 THEN 'Labor Day'
WHEN MONTH(date)=11 AND DatePart(weekday, DateAdd(day,@@DATEFIRST,[date]))=5 AND DAY(date) BETWEEN 22 and 28 THEN 'Thanksgiving'
WHEN MONTH(date)=11 AND DatePart(weekday, DateAdd(day,@@DATEFIRST,[date]))=6 AND DAY(date) BETWEEN 22 and 28 THEN 'Day after Thanksgiving'
WHEN MONTH(date)=12 AND DAY(date)=25 THEN 'Christmas Eve'
WHEN MONTH(date)=12 AND DAY(date)=26 THEN 'Christmas Day'
ELSE ''
END /* CASE */,
You're only in some trouble if you need to do Easter... every other U.S. holiday I can think of follows one of two simple patterns to figure out when it falls.
If you find yourself doing calculations that count business days and the like, I find it helpful to add this:
[DayOfWeek] AS DatePart(weekday, DateAdd(day,@@DATEFIRST,[date])),
[DayOfWeek_desc] AS (
CASE DatePart(weekday, DateAdd(day,@@DATEFIRST,[date]))
WHEN 1 THEN 'SUN'
WHEN 2 THEN 'MON'
WHEN 3 THEN 'TUE'
WHEN 4 THEN 'WED'
WHEN 5 THEN 'THU'
WHEN 6 THEN 'FRI'
WHEN 7 THEN 'SAT'
END /*CASE*/
),
[IsWeekend] AS CAST((CASE WHEN DatePart(weekday, DateAdd(day,@@DATEFIRST,[date])) in (1,7) THEN 1 ELSE 0 END) AS bit),
...which also gives you a way of marking what day-of-the week a date is that doesn't change if @@DATEFIRST does, and lets you generate date labels; and then this (assuming IsBusinessDay is a bit defaulting to 1):
UPDATE c
SET IsBusinessDay = 0
FROM dbo.Calendar c
WHERE Holiday != ''
OR ( DayOfWeek=6 AND EXISTS (select * from dbo.Calendar c2 where c2.date=dateadd(day,1,c.date) and c2.Holiday!='') ) -- Saturday holidays recognized on preceeding Friday
OR ( DayOfWeek=2 AND EXISTS (select * from dbo.Calendar c2 where c2.date=dateadd(day,-1,c.date) and c2.Holiday!='') ) -- Sunday holidays recognized on following Monday
There is probably a better way to do these in one swoop (?) but it's a one-time script that lets you do a lot of things more easily.
Nah, Easter is easy. It is the first Sunday after the first full moon after the Vernal Equinox.
I'm sure there's a CLR out there somewhere for that.
-- Itzik Ben-Gan 2001
Viewing 15 posts - 121 through 135 (of 215 total)
You must be logged in to reply to this topic. Login to reply