February 23, 2010 at 7:09 am
Manie Verster (2/23/2010)
Jeff I am on your case today.
No problem, Manny... I'm on your's, as well but I did install some handrails so you can enjoy the ride. 😀
No where did I say to use a Tally table as a source for this. I pointed at the Tally Table article to give people a clue on how to build a table on the fly using a couple of methods. You actually used that method but came up short... no scalability.
Also, you produced 4 columns... not part of the requirement. You also used a Global Temp table which means that the code can only be used one instance at a time because if the code is executed more than once at the same time, there's a chance that the drop will cause a failure in the other run. If you don't use the drop, then there's the chance that the SELECT/INTO to create the table will fail because the table already exists. And, despite the requirement, the code is hardcoded to 100.
Again, if you "just" meet the requirements, you'll be the same as all the other interviewers. Make yourself standout! Heh... start by meeting the original requirements instead of creating 4 columns. 😛
Oh... by the way... the reason why your code appears to run so fast on my machine is because it uses the copy of syscolumns in the local database. There's only 431 rows in that table in my test DB. Do you have 11000 rows in your machine for the PostalDB or does it come up a wee bit short? Also, if you look at the code, you included making a permanent Tally table AND building it's clustered index in the run time. In real life, the Tally table would already exist. When I changed your code to actually create 11000 rows and remove the time to make the permanent table, your code lost by a pretty good percentage.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 23, 2010 at 7:18 am
Manie Verster (2/23/2010)
Take note, you only have to do 100 records for this interview.
True enough... and the interviewer will send all of those people who only make it capable of doing 100 "records" packing because someone will be smart enough to know how easy it is to make it much more scalable and the interviewer will pick that person for the job.
If you want the job, make yourself standout above all the others. It's not obsession on my part... it's survival skills and I'm trying to tell you how to survive an interview when 10 or 20 other people are lined up for the same job. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
February 23, 2010 at 7:23 am
A couple good points by Jeff. First, you don't have to be the most impressive T-SQL person out there. You are taking on the other 20 people that applied for the job. I find it rare that companies pass on everyone and look for more candidates. It happens, but often companies settle because they want a body working.
When you interview, I'd argue against appearing obsessive. I don't think that's what Jeff does, or most of the people that would write code in 5 minutes. But if you keep tweaking, or referring back to the code, or trying to find ways to improve it, I'd be worried that you'd do that on the job. I want good, fast, well written code, but I want it done. You write the best code you can in the time you have and move on. You might go back to it when time allows, but I do think you need to be effective and move on.
However, as Jeff has said many times, it doesn't take more time to write good code the first time than it does to write bad. That's why you practice and improve your skills. So that your code (should) get better over time.
February 23, 2010 at 7:47 am
Gary Varga (2/23/2010)
Jeff,I have followed this thread from the start and I have read the article of yours you recommended. I was hoping to work out what solution you are eluding to. I cannot. Possibly I cannot piece together what I have read so far but you are suggesting, or at least that is how I am reading it, that there is an even more elegant (pertaining to performance and maintainability) solution.
Is it possible that the technique can be demonstrated. I am not looking for the answer to the so called interview question, however, I am looking to improve myself. Even if you gave me the answer I would expect to fail at interview as I would expect an interviewer to get me to explain my reasoning - said as much to lurkers as to Jeff!!!
Perhaps this is the basis of an article?
You're right, Gary... it might make a good article called "Winning the Interview with Code" or somesuch.
As you can see, the code to do the CASE statement for the "FizzBuzz" thing is the logical way to go for the solution of this problem. The thing that separates the men from the boys is how the numbers that will be used are generated. With the small modification of using ROW_NUMBER(), the method for creating a Tally table in the Tally Table article is one way to generate up to 16 million rows in a CTE very quickly IF (unlike what so many people on this thread did), you use the correct table as a source and IF you make the realization that just adding the one cross-join will make the code one heck of a lot more scalable.
For those that add a triple cross-join... eh... wrong answer because behind the scenes, it will cause explosive growth of one of the log tables for big numbers like a Billion (for example). Running a triple cross join to create a Billion rows will grow one of the log files to over 40GB even if it's done in TempDB.
Then, there's Itzik Ben Gan's nested CTE method which will create a Billion rows without a single read and, with a simple modification, can be made even more brief than it is. It only takes several minutes to memorize. It can be easily modified to create a programmable "Tally inline function" that can be used in any FROM clause or his code can be included directly in code. Lynn Pettis wrote an article about such modifications after a bunch of us did some experimenting together. Search for "Guest Columnist Lynn Pettis" and find the article on creating a programmable numbers table.
Shifting gears and not directed at Gary or anyone in particular...
Again and just to be sure that everyone understands the point I'm trying to make here... anyone can write a RBAR solution using a WHILE loop or Recursive CTE for this. Hell... look how many people did it on this thread alone! And anyone can get a little clever and use a single table as a row source to make it set based with some scalability problems. Look how many people did that on this thread, as well. That's the problem on an interview... the other 10 or 20 people are going to know and show the same lack-luster code and use execuses like "time" or "rushed". If you want to beat out the other 10 or 20 people, you need to think and breath very high performance, highly scalable, nicely formatted, somewhat commented set based code or you're just going to look like all the other people interviewing from the job. If you're actually good at it, it'll take you less time than the person who's writing a WHILE loop or Recursive CTE.
Learn how to stand out. Get good at it. PRACTICE and stop making excuses like "the requirement was only for 100 rows" or "there was a time limit". 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
February 23, 2010 at 8:10 am
Tom.Thomson (2/22/2010)
First I tried writing a set-based solution which had to generate its numbers table (so that it would scale beyond my pre-existing tally table) and the I tried writing a loop-based one, just to see which I could write quicker. They both took about the same time to write. I must have been reading too much of your stuff Geoff, because I am not really an SQL programmer!
Tom... you're a much better T-SQL programmer than you give yourself credit for especially since you run experiments like you did. Like Sergiy once told me (and I live by it), "A Developer must not guess... a Developer must KNOW!". Well done.
By the way, there was method in choosing a hundred million as a limit - it's big enough to need a 3-way cross join of master.sys.all_columns to generate the CTE, and small enough that limiting the size of that cross join (by not cross-joining the whole table to itself, but joining part of the table to itself) is useful.
You would have no way of knowing this unless you ran a Billion row test like I did... more than a single cross join will cause extensive log file usage. Generating a Billion rows using a triple cross join caused the log file on my machine (I forget which DB it did it in) to grow to over 40GB! If you ever need to generate more than what you have in your Tally table and you're using 2k5 or better, I'd recommend using one of the various mods of Itzik Ben Gan's nested CTE floating around on this site. By itself, it causes zero reads and zero writes and uses comparitively little CPU time.
Anyway, thank you very much for the feedback on this. I apologize for not getting back to you sooner. I had my arms full of pork chops. 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
February 23, 2010 at 8:22 am
I shall be reading Jeff's articles as well - as I am one of those that has trouble thinking set based (I can code a solution to this in COBOL in less thean a minute!), but looking at the question it doesn't mention scalability and a well formatted and commented loop does the job very well and fast enough for me! I think as long as the interviewee mentioned that this was a solution based on the numbers 1 to 100 I would be happy. If they then said a set-based solution would be more appropriate for a larger set of numbers I'd be even happier 😉 Now, off to those articles and a bit less RBAR 😀
February 23, 2010 at 8:23 am
Jeff Moden (2/23/2010)
You're right, Gary... it might make a good article called "Winning the Interview with Code" or somesuch.
OK. Perhaps two articles. The one you suggest should highlight that the code you show says a lot AND you must know the code you write - not a technical article. The other one might be entitled "Set Theory vs Functional Programming: Why Set Theory should always be your default choice." - title obviously applicable to SQL Server (or even RDBMS' in general) development.
Thanks for your efforts Jeff.
Gaz
-- Stop your grinnin' and drop your linen...they're everywhere!!!
February 23, 2010 at 8:38 am
Gary Varga (2/23/2010)
Jeff Moden (2/23/2010)
You're right, Gary... it might make a good article called "Winning the Interview with Code" or somesuch.OK. Perhaps two articles. The one you suggest should highlight that the code you show says a lot AND you must know the code you write - not a technical article. The other one might be entitled "Set Theory vs Functional Programming: Why Set Theory should always be your default choice." - title obviously applicable to SQL Server (or even RDBMS' in general) development.
Thanks for your efforts Jeff.
I can't write that second article because it wouldn't be true. There are places where a very well written WHILE loop or function will smoke the set based methods... even the Tally table. They're just fairly rare. Although RBAR has come to mean "no loops" to a lot of folks, RBR (without the "A") can be and is very effective in some instances. Of course, that might make for a good article too. I'm just not sure I want to admit in public that some good folks have spanked some instances of my code that hard. 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
February 23, 2010 at 8:41 am
...are we allowed to say spanked?
Gaz
-- Stop your grinnin' and drop your linen...they're everywhere!!!
February 23, 2010 at 9:07 am
reidres (2/23/2010)
Manie Verster (2/23/2010)
Jeff Moden (2/22/2010)
Manie Verster (2/22/2010)
I checked out Grant Fritchey's[/url] article and was, like Jeff, not very pleased to see some while loops but one query attracted my attention and I tested it and I like it.
;WITH Nbrs(n) AS
(
SELECT 1
UNION ALL SELECT 1 + n FROM Nbrs WHERE n < 100
)
SELECT CASE WHEN n%5=0 AND n%3=0 THEN 'BizzBuzz'
WHEN n%3 = 0 THEN 'Bizz'
WHEN n%5 = 0 THEN 'Buzz'
ELSE CAST(n AS VARCHAR(8))
END
FROM Nbrs
OPTION (MAXRECURSION 100);
Nice go, Manny... but because I'm mostly against the use of procedural code, I'd have to deduct a lot of points on this one. Contrary to popular belief, recursive CTEs are [font="Arial Black"]NOT[/font] set based and they are frequently slower and more resource intensive than a well written WHILE loop.
It is good to take a standpoint Jeff and stick to it but there is a thing called obsessiveness. I tested this query against the one I posted with the temp table and both gave me 3 ms and I used your way of checking a query's duration with the start and en dates and datediff in ms. Would that then be so bad?!
As far as scalability goes, 32767 is the max # of rows for that solution. If you exceed that you get a msg: MAXRECURSION option exceeds the allowed maximum of 32767. It takes 2 seconds to run up to that # on my machine. And it appears that the default maxrecursion is 100 if no related option is stated.
For the maxrecursion problem, check out my blog. I discuss this topic there and how to get around the maxrecursion problem when testing large recordsets.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
February 23, 2010 at 9:13 am
Since this became a performance discussion, I added a question on ASK about the best method. I know someone might memorize this for an interview, but you could easily change the question, or even make them explain how it works. That would let you see if they understand how to code or just how to memorize.
I'll even offer a prize. $25 to the best solution on Mar 31. (highest vote total)
February 23, 2010 at 9:16 am
Steve Jones - Editor (2/23/2010)
Since this became a performance discussion, I added a question on ASK about the best method. I know someone might memorize this for an interview, but you could easily change the question, or even make them explain how it works. That would let you see if they understand how to code or just how to memorize.I'll even offer a prize. $25 to the best solution on Mar 31. (highest vote total)
Dang it. That was my second article in the blog series.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
February 23, 2010 at 9:29 am
The solution or the contest? You can post on your blog later, enter the contest now.
February 23, 2010 at 9:51 am
Is it for real cash or vouchers for pork chops?
Gaz
-- Stop your grinnin' and drop your linen...they're everywhere!!!
February 23, 2010 at 10:20 am
Gary Varga (2/23/2010)
Is it for real cash or vouchers for pork chops?
If it is for pork chops then I am in 😉
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
Viewing 15 posts - 136 through 150 (of 363 total)
You must be logged in to reply to this topic. Login to reply