November 11, 2016 at 5:04 pm
Hi everyone,
I'm a newbie and recently I've come across an excercise quite complex (for me). In short, these are the requirements:
A - an inline table-valued function with 1 parameter (an INT);
B - the output will be 2 columns;
C - the first column lists integers in ascending order starting from 1 up to the parameter value (included);
D - the second column is a varchar and tests if the number in the first column is divisible by 3, by 5 or by both.
I'm stuck on the point C because with an inline table-valued function I cannot use any branching logic (while, if...), then I really don't know how to list the numbers from 1 to the parameter. The point D should be feasible with the CASE expression, but with the point C I'm going slightly mad... Maybe the solution is very easy, but so far I cannot find it out.
Any hints?
November 11, 2016 at 5:16 pm
The VALUES constructor can be used to generate rows and row_number() can assign incrementing integers to the constructed rows...
Come back to us if you need more hints
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
November 11, 2016 at 7:51 pm
TheTrojan (11/11/2016)
Hi everyone,I'm a newbie and recently I've come across an excercise quite complex (for me). In short, these are the requirements:
A - an inline table-valued function with 1 parameter (an INT);
B - the output will be 2 columns;
C - the first column lists integers in ascending order starting from 1 up to the parameter value (included);
D - the second column is a varchar and tests if the number in the first column is divisible by 3, by 5 or by both.
I'm stuck on the point C because with an inline table-valued function I cannot use any branching logic (while, if...), then I really don't know how to list the numbers from 1 to the parameter. The point D should be feasible with the CASE expression, but with the point C I'm going slightly mad... Maybe the solution is very easy, but so far I cannot find it out.
Any hints?
It's called the "FIZZ BUZZ" problem (return "Fizz" for multiples of 3, "Buzz" for multiples of 5, and "Fizz Buzz" for multiples of both) and is frequently used as an interview question. It has many solutions the nature of which very quickly identify what the skill and experience levels a candidate may have even if they're really good at Yabingooglehoo because, while many successfully solve the problem, many do so using poorly performing, resource intensive methods while others demonstrate that they have the "set based" mentality.
One of the best hints I can give you can be found in the following article. When I first became aware of the tool (I didn't invent it), it changed my life as a Database Developer and wrote an article about it to try to help others.
[font="Arial Black"]The "Numbers" or "Tally" Table: What it is and how it replaces a loop [/font][/url]
There's also an article that covers one very (unfortunately) common method for "counting" and 3 other methods that blow it's doors off.
[font="Arial Black"]Hidden RBAR: Counting with Recursive CTE's [/font][/url]
Part of the reason that such puzzles as the "Fizz Buzz" problem are so important is because they teach you how to count, which is taught in virtually every programming course there is except for database programming courses. It is an essential skill especially when you look deeper into it's purpose... it teaches you how to think in sets and, going even deeper, teaches you that, behind the scenes, SQL loops so you don't have to. Understanding that even a simple SELECT is really a loop behind the scenes (I call them "pseudo-cursors", a term coined by R. Barry Young on these very forums) is one of the fundamental understandings necessary to writing highly efficient, nasty fast SQL.
Another tool, strongly based on counting and has a direct relationship with the "Fizz Buzz" problem, is the quintessential art of making mountains of just about any kind of test data you might want or need. The following articles teach you how to do that and will also aid you in solving the "Fizz Buzz" problem.
[font="Arial Black"]Generating Test Data: Part 1 - Generating Random Integers and Floats [/font][/url]
[font="Arial Black"]Generating Test Data: Part 2 - Generating Sequential and Random Dates
[/font][/url]
To easily understand the simple differences between "RBAR" and the paradigm shift necessary to writing high performance database code, please see the "quotes" in my signature line below.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 11, 2016 at 8:12 pm
p.s. First, thank you for wanting "hints" instead of a "solution". It absolutely demonstrates that you have the right stuff for database programming.
Second, because of your interest in teaching yourself, do feel free to post your solution once you've hammered it out and we'll be happy to critique it and provide any help if it's needed.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 13, 2016 at 11:08 am
Hi Jeff, first of all let me thank you for all the information provided. I spent one day in reading the articles, testing and re-testing. It was really worth. I must admit that I often use (or maybe used) recursive CTEs and/or While loop: now I know that this is only superficially a smart solution. From now on tally tables, cross joins and pseudo-cursors will become my best friends!
As for my question, yes, you are right, in every programming language you're taught how to count, for example I could easily solve this problem in C#. I was stuck here mainly because I had to use an inline table-valued function where you cannot use loops. After reading your articles I found a solution by using an Itzik style cross join, I guess it's not the only one. It works and I've also checked the timing when I call the function with different parameters:
-- if the parameter is 1000 I get 280 ms
-- if the parameter is 10000 I get 400 ms
-- if the parameter is 100000 I get 2183 ms
IF OBJECT_ID('dbo.GetList', 'IF') IS NOT NULL DROP FUNCTION dbo.GetList;
GO
CREATE FUNCTION dbo.GetList(@Num AS INT)
RETURNS TABLE AS RETURN
WITH
C1(N) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1
), -- 5 rows
C2(N) AS (SELECT 1 FROM C1 a, C1 b), -- 25 rows
C3(N) AS (SELECT 1 FROM C2 a, C2 b), -- 625 rows
C4(N) AS (SELECT 1 FROM C3 a, C3 b) -- 390,625 rows
SELECT
TOP(@Num) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS NumList,
CASE
WHEN ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) % 15 = 0 THEN 'FlipFlop'
WHEN ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) % 5 = 0 THEN 'Flop'
WHEN ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) % 3 = 0 THEN 'Flip'
ELSE 'Unknown'
END AS Divisible
FROM C4;
November 13, 2016 at 11:37 am
@mister.magoo Hi, I used the row_number and a cross join after reading the Jeff's articles. You can see my solution somewhere in the page.
Could you tell me one thing? If you want to use the VALUES, it means that you need an INSERT and then a SELECT, but this sounds to me more like a multistatement table-valued function, while I need an inline table-valued function. As far as I know, in an inline table-valued function you can use just a SELECT, but maybe I'm wrong.
If you can show me an example with the VALUES constructor within an inline table-valued function, I will be happy to compare it with my solution. Anyway, thanks for your reply.
November 13, 2016 at 11:49 am
TheTrojan (11/13/2016)
@mister.magoo Hi, I used the row_number and a cross join after reading the Jeff's articles. You can see my solution somewhere in the page.Could you tell me one thing? If you want to use the VALUES, it means that you need an INSERT and then a SELECT, but this sounds to me more like a multistatement table-valued function, while I need an inline table-valued function. As far as I know, in an inline table-valued function you can use just a SELECT, but maybe I'm wrong.
If you can show me an example with the VALUES constructor within an inline table-valued function, I will be happy to compare it with my solution. Anyway, thanks for your reply.
Hi,
Good work by the way, that looks like a good implementation of the method.
The VALUES clause can be used anywhere you need a row source, so your function re-written using VALUES looks like this, but there is no particular value or benefit in re-writing it - in this case the performance will remain about the same I suspect. Of course, you could test it to see for yourself...
IF OBJECT_ID('dbo.GetList', 'IF') IS NOT NULL DROP FUNCTION dbo.GetList;
GO
CREATE FUNCTION dbo.GetList(@Num AS INT)
RETURNS TABLE AS RETURN
WITH
C1(N) AS
(
SELECT V FROM (VALUES(1),(1),(1),(1),(1)) AS VALS(V)
), -- 5 rows
C2(N) AS (SELECT 1 FROM C1 a, C1 b), -- 25 rows
C3(N) AS (SELECT 1 FROM C2 a, C2 b), -- 625 rows
C4(N) AS (SELECT 1 FROM C3 a, C3 b) -- 390,625 rows
SELECT
TOP(@Num) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS NumList,
CASE
WHEN ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) % 15 = 0 THEN 'FlipFlop'
WHEN ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) % 5 = 0 THEN 'Flop'
WHEN ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) % 3 = 0 THEN 'Flip'
ELSE 'Unknown'
END AS Divisible
FROM C4;
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
November 13, 2016 at 2:07 pm
@mister.magoo You will be happy to know that your solution is slightly faster than mine! Have a look at the times:
-- if the parameter is 1000: 76 ms (instead of 280 ms)
-- if the parameter is 10000: 233 ms (instead of 400 ms)
-- if the parameter is 100000: 1920 ms (instead of 2183 ms)
November 13, 2016 at 2:23 pm
TheTrojan (11/13/2016)
@mister.magoo You will be happy to know that your solution is slightly faster than mine! Have a look at the times:
-- if the parameter is 1000: 76 ms (instead of 280 ms)
-- if the parameter is 10000: 233 ms (instead of 400 ms)
-- if the parameter is 100000: 1920 ms (instead of 2183 ms)
Incredible. You've gone from not knowing what a Tally Table/Tally function is to building and testing them all in just a day or two AND you appear to understand how they're used. You also understand the principle of "pseudo-cursors" and why rCTEs are all they're cracked up to be. And, you solved the "Fizz Buzz" problem that you originally posted about.
You're going to do VERY well in the world of databases. Very glad to have met you.
Now that you know all that, here's the Tally Function that I use. You can tell it to start at 1 or 0, both of which are common uses. The reason why I still use SELECT/UNION ALL instead of VALUES is because I still need to support a lot of folks that use 2005. There's no appreciable difference in performance between the two. SELECT/UNION ALL simply makes the code a little longer.
As a bit of a sidebar, this is also how I document my code in real life and, yes, this is a function that I use in production.
CREATE FUNCTION [dbo].[fnTally]
/**********************************************************************************************************************
Purpose:
Return a column of BIGINTs from @ZeroOrOne up to and including @MaxN with a max value of 1 Trillion.
As a performance note, it takes about 00:02:10 (hh:mm:ss) to generate 1 Billion numbers to a throw-away variable.
Usage:
--===== Syntax example (Returns BIGINT)
SELECT t.N
FROM dbo.fnTally(@ZeroOrOne,@MaxN) t
;
Notes:
1. Based on Itzik Ben-Gan's cascading CTE (cCTE) method for creating a "readless" Tally Table source of BIGINTs.
Refer to the following URLs for how it works and introduction for how it replaces certain loops.
http://www.sqlservercentral.com/articles/T-SQL/62867/
http://sqlmag.com/sql-server/virtual-auxiliary-table-numbers
2. To start a sequence at 0, @ZeroOrOne must be 0 or NULL. Any other value that's convertable to the BIT data-type
will cause the sequence to start at 1.
3. If @ZeroOrOne = 1 and @MaxN = 0, no rows will be returned.
5. If @MaxN is negative or NULL, a "TOP" error will be returned.
6. @MaxN must be a positive number from >= the value of @ZeroOrOne up to and including 1 Billion. If a larger
number is used, the function will silently truncate after 1 Billion. If you actually need a sequence with
that many values, you should consider using a different tool. ;-)
7. There will be a substantial reduction in performance if "N" is sorted in descending order. If a descending
sort is required, use code similar to the following. Performance will decrease by about 27% but it's still
very fast especially compared with just doing a simple descending sort on "N", which is about 20 times slower.
If @ZeroOrOne is a 0, in this case, remove the "+1" from the code.
DECLARE @MaxN BIGINT;
SELECT @MaxN = 1000;
SELECT DescendingN = @MaxN-N+1
FROM dbo.fnTally(1,@MaxN);
8. There is no performance penalty for sorting "N" in ascending order because the output is explicity sorted by
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
Revision History:
Rev 00 - Unknown - Jeff Moden
- Initial creation with error handling for @MaxN.
Rev 01 - 09 Feb 2013 - Jeff Moden
- Modified to start at 0 or 1.
Rev 02 - 16 May 2013 - Jeff Moden
- Removed error handling for @MaxN because of exceptional cases.
Rev 03 - 22 Apr 2015 - Jeff Moden
- Modify to handle 1 Trillion rows for experimental purposes.
**********************************************************************************************************************/
(@ZeroOrOne BIT, @MaxN BIGINT)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN WITH
E1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1) --10E1 or 10 rows
, E4(N) AS (SELECT 1 FROM E1 a, E1 b, E1 c, E1 d) --10E4 or 10 Thousand rows
,E12(N) AS (SELECT 1 FROM E4 a, E4 b, E4 c) --10E12 or 1 Trillion rows
SELECT N = 0 WHERE ISNULL(@ZeroOrOne,0)= 0 --Conditionally start at 0.
UNION ALL
SELECT TOP(@MaxN) N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E12 -- Values from 1 to @MaxN
;
--Jeff Moden
Change is inevitable... Change for the better is not.
November 13, 2016 at 3:24 pm
Shifting gears a bit, ROW_NUMBER() OVER is relatively expensive. The same DRY (Don't Repeat Yourself) principles that folks use in front end code also apply to T-SQL. It also has the tendency to make the code much simpler. Try the following for a solution to the FizzBuzz problem.
--===== Start measuring time and I/O
SET STATISTICS TIME,IO ON
;
--===== Run the solution using fnTally and DRY principles
SELECT t.N
,FlipFlop = CASE
WHEN t.N % 15 = 0 THEN 'FLIPFLOP'
WHEN t.N % 3 = 0 THEN 'FLIP'
WHEN t.N % 5 = 0 THEN 'FLOP'
ELSE ''
END
FROM dbo.fnTally(1,100000) t
ORDER BY t.N
;
--===== Stop measuring time and I/O
SET STATISTICS TIME,IO OFF
;
Here's what I get for timing. Also notice that the code produces no reads. Try THAT with an rCTE! 😀
-- With returns to the grid mode:
-- if the parameter is 1000: CPU time = 0 ms, elapsed time = 63 ms.
-- if the parameter is 10,000: CPU time = 0 ms, elapsed time = 81 ms.
-- if the parameter is 100,000: CPU time = 16 ms, elapsed time = 641 ms.
-- if the parameter is 1,000,000: CPU time = 328 ms, elapsed time = 6272 ms.
Returning output to the screen causes the durations to be terrible. If we take the display out of the picture, such as it will be in most situations, the durations get a bit better at the expense of a little more CPU.
--===== If the target table already exists,
-- drop it to make reruns in SSMS easier.
IF OBJECT_ID('tempdb..#Target','U') IS NOT NULL
DROP TABLE #Target
;
--===== Start measuring time and I/O
SET STATISTICS TIME,IO ON
;
--===== Run the solution using fnTally and DRY principles
SELECT t.N
,FlipFlop = CASE
WHEN t.N % 15 = 0 THEN 'FLIPFLOP'
WHEN t.N % 3 = 0 THEN 'FLIP'
WHEN t.N % 5 = 0 THEN 'FLOP'
ELSE ''
END
INTO #Target
FROM dbo.fnTally(1,1000000) t
ORDER BY t.N
;
--===== Stop measuring time and I/O
SET STATISTICS TIME,IO OFF
;
-- With returns dumped to a Temp Table on-the-fly:
-- if the parameter is 1000: CPU time = 0 ms, elapsed time = 1 ms. (typical, can jump to 35 on first)
-- if the parameter is 10,000: CPU time = 0 ms, elapsed time = 6 ms. (typical, can jump to 35 on first)
-- if the parameter is 100,000: CPU time = 47 ms, elapsed time = 83 ms. (typical)
-- if the parameter is 1,000,000: CPU time = 453 ms, elapsed time = 1155 ms. (usual)
I also tested for "3" before "5" because there are more numbers evenly divisible by 3 than 5, which short-circuits the CASE statement a little which also helps performance a little. In the larger scheme of things, microseconds count. 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
November 13, 2016 at 7:39 pm
Jeff, something tells me you had fun with this one. 😀
November 14, 2016 at 6:48 am
Ed Wagner (11/13/2016)
Jeff, something tells me you had fun with this one. 😀
Many times over many years. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
November 14, 2016 at 1:57 pm
Oh too good, let's say that your articles are really helpful: I've just applied what I've learned. By the way, I've compared your Tally function with the same function with VALUES: you're right, there's not a real advantage, just slight differences.
Apart from the Tally function, allow me also to steal your style in documenting queries: I think it's not a trivial aspect.
November 14, 2016 at 2:09 pm
Just impressive, 453 ms for 1 million rows! I didn't expect it to be so fast.
As for the CASE, I was wondering whether it was better to use "5" before "3" or viceversa, now I know that a small difference can become an important advantage with big numbers. Thanks also for this lesson.
November 14, 2016 at 3:16 pm
TheTrojan (11/14/2016)
Just impressive, 453 ms for 1 million rows! I didn't expect it to be so fast.As for the CASE, I was wondering whether it was better to use "5" before "3" or viceversa, now I know that a small difference can become an important advantage with big numbers. Thanks also for this lesson.
That's the total CPU time. The duration was just over a second and most of that is because of I/O. In real life, you'd just use the data so it will usually be faster that way. You could simulate it by dumping the output of the two columns to variables.
Shifting gears, thank you for the feedback and the compliment of imitation. There's no higher compliment in my book. I'm humbled.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 20 total)
You must be logged in to reply to this topic. Login to reply