November 14, 2008 at 2:36 pm
Heh... sorry... wrong graphic... SQL Server 2k5 won't even show the ACTUAL execution plan for anything inside a WHILE loop and the estimated execution plan shows everything with a cost of 0. That's because it can't measure the cost of processing a single row or character and you know it's a single row or character because of the skinny lines and the "0 B" indication....
Here's the estimated execution plan for the WHILE loop and all it contains...
--Jeff Moden
Change is inevitable... Change for the better is not.
November 14, 2008 at 3:02 pm
Take a look at the stored procedure code I posted on this link. It should meet all the requirements you stated.
Generate Password Procedure
November 14, 2008 at 3:07 pm
Jeff,
Just gotta know, has anyone every called you an anti-rbar fanatic? This seems to be the main area you really become "excited" about. And NOT that it is something bad. If we can eliminate much of the RBAR code people develop, that's good.
I've learned a few tricks from you in that regard as well, and it has helped in several queries.
November 14, 2008 at 3:10 pm
Jeff Moden (11/14/2008)
Also, I never suggested making a Tally table on the fly... a permanent Tally table is really the way to go, in most cases. Exception to that rule is if you use the same method I always use to make a Tally table but make it as a CTE ROW_NUMBER instead.
You showed this in another thread somewhere if I remember, but I can't find it. Do you think you could post it again?
November 14, 2008 at 3:43 pm
Let's look at this more closely. Good point, I didn't catch it was only showing the first iteration of both loops. Let's look at the cost per "row" for the iterative or procedural approach. Total cost = 0.003290117. However, the table scan for retrieving from password_policy is 0.0032831. That means the row iterator is 0.00000702. Well, we know this isn't accurate because the more # of characters, the more times that second set of iterations has to go through. And that's a key difference. Think about the problem at hand. You're testing complexity on a... password. The second iterator is going to cause the whole operation to be more expensive as the number of characters in the password increase. I don't argue that. But by increasing the size of the password to get a number greater than 0, you skewed the test where it was outside of the domain for the problem. After all, how many folks have passwords that are 200 characters in length?
So what this basically says is that for some length of password, the iterative approach is faster. How much faster is a good question. We know that 0.00000702 isn't exactly accurate, because if you based a calculation on just that number, you'd get 642 as the break even point, even including a permanent tally table ((cost for the query in the tally table example - cost for the table scan) / row iterator cost) and we know at 200 characters the tally table wins out. There are two factors which determine the length of processing for the iterative/procedural approach: # of characters in the password and # of special characters to test for. Case in point, note the example where the special characters also include digits. I caught this when I first put the IF ELSE nesting together as the tally table approach was reporting 5 special characters and the iterative was reporting 3. That's because I was exiting out when I saw that the character was a number, but that doesn't fit with how special characters are defined.
K. Brian Kelley
@kbriankelley
November 14, 2008 at 3:52 pm
K. Brian Kelley (11/14/2008)
Let's look at this more closely. Good point, I didn't catch it was only showing the first iteration of both loops. Let's look at the cost per "row" for the iterative or procedural approach. Total cost = 0.003290117. However, the table scan for retrieving from password_policy is 0.0032831. That means the row iterator is 0.00000702. Well, we know this isn't accurate because the more # of characters, the more times that second set of iterations has to go through. And that's a key difference. Think about the problem at hand. You're testing complexity on a... password. The second iterator is going to cause the whole operation to be more expensive as the number of characters in the password increase. I don't argue that. But by increasing the size of the password to get a number greater than 0, you skewed the test where it was outside of the domain for the problem. After all, how many folks have passwords that are 200 characters in length?So what this basically says is that for some length of password, the iterative approach is faster. How much faster is a good question. We know that 0.00000702 isn't exactly accurate, because if you based a calculation on just that number, you'd get 642 as the break even point, even including a permanent tally table ((cost for the query in the tally table example - cost for the table scan) / row iterator cost) and we know at 200 characters the tally table wins out. There are two factors which determine the length of processing for the iterative/procedural approach: # of characters in the password and # of special characters to test for. Case in point, note the example where the special characters also include digits. I caught this when I first put the IF ELSE nesting together as the tally table approach was reporting 5 special characters and the iterative was reporting 3. That's because I was exiting out when I saw that the character was a number, but that doesn't fit with how special characters are defined.
You missed what I'm trying to say... the "costs" you see in the execution plan have nothing to do with reality.
Tell you what... let's do a "real" test. You use your interive approach and I'll use the Tally table to come up with the 3 character counts for 100,000 10 character passwords and we'll see which method wins. To make the test extremely repeatable, you pick any 10 passwords and we'll "smear" them across 10,000 rows each. Game on?
--Jeff Moden
Change is inevitable... Change for the better is not.
November 14, 2008 at 4:11 pm
Hang on!! How can I get in on this and sell tickets?? I'll even give 75% of the proceeds to SSC!
November 14, 2008 at 4:41 pm
Lynn Pettis (11/14/2008)
Jeff,Just gotta know, has anyone every called you an anti-rbar fanatic? This seems to be the main area you really become "excited" about. And NOT that it is something bad. If we can eliminate much of the RBAR code people develop, that's good.
I've learned a few tricks from you in that regard as well, and it has helped in several queries.
Absolutely I'm an Anti-RBAR fanatic... :alien: And, yes, I've been called that. I also have people asking me if they can use my Avatar as their own.
The companies I've worked for have had people say things like "It will only ever contain 1000 rows... procedural methods won't be that much slower than a set based solution". So, off they go and they write some really good code even if it is RBAR. Then, 1 of 2 things happens... either their estimate about the table never getting bigger than a thousand rows suddenly balloons to a couple million rows, or some poor sucker is being beat to death by his boss because of schedule, finds the RBAR solution and modifies it to fit the exact need, tests it on a thousand rows, good enough, and then uses it... on a table that's designed to hold millions of rows. It's cost those companies sometimes in the millions to fix the problem in the face of scalability
It's like practicing the piano... you wouldn't intentionally hit the wrong keys... why would anyone intentionally write bad code? And, yes, all RBAR is bad code.
What about the time to write set based code? Well, if you look at the amount of code that Brian had to write to use the loop method, I'd say "NO"! What about RBAR being easier to write? If you don't understand set based, then I'd say "Of course it is"... just like playing chopsticks is easier to play than a good classical piece. BUT, a master can play the classical piece as easily as someone less practiced can play chopsticks and that's the key... people need to stop making excuses and learn their trade well enough to not have to write RBAR. There's just no excuse for RBAR except being too lazy to practice (not directed at anyone in particular so no hate mail please :D).
BUT, I have to admit that I absolutely [font="Arial Black"]LOVE [/font]RBAR... I took a job that took 24 hours to sometimes fail, made it do 50% more work, and made it run in only 11-15 minutes. And, it hasn't failed yet! It had taken 4 developers about 2 weeks each to write the RBAR in the original code... using set based techniques, it took me 3 days to rewrite it, womb-to-tomb, and it worked first time with no errors. The company gave me a $2,000 bonus, after taxes. The really funny part was, the first time they ran it, it ran so fast they thought it didn't actually run. It took me an hour to figure out why they thought it failed. 😛
One more story and I'll get off the RBAR soap-box (well, maybe for a minute)... the company I'm currently working for has a legacy system to import some fairly complicated CSV's. These are no normal CSV's... they have "file headers and footers", a double data header, and the data has to be unpivoted either in pairs of columns or in groups of 4 all in the same file each with a PK in one of the first 9 columns that can change position. Further the first 9 columns are "normalized" and have to go to a separate table. And, the kicker is, except for the first 9 columns, the number of columns from file to file is variable and they may appear in any order. The current RBAR methods using a combination of Bulk Insert, DTS, and a PERL script takes 40 minutes just to get a lousy 30K rows with 215 columns of data ready for a final import. AND, they've actually had to split files because it uses up all the memory on such a file. The Tally table and I (hmmm... book title, maybe) came up with some Proof-of-Principle code (works but has not been optimized) that does everything the RBAR code does, but also DOES the final import, all in less than 90 seconds. {EDIT} I forgot to mention that the company has a hiring freeze on... they're trying to figure out a way to make an exception so I can rewrite all of their ETL.
Heh.. so, yeah... I'm an "Anti-RBAR fanatic"... it's how I make my living. Spread the word and join the "club". Don't forget to pick up your supply of porkchops. :w00t:
--Jeff Moden
Change is inevitable... Change for the better is not.
November 14, 2008 at 4:42 pm
Lynn Pettis (11/14/2008)
Hang on!! How can I get in on this and sell tickets?? I'll even give 75% of the proceeds to SSC!
Heh... you won't have 75% left because I'm gonna want 50%. 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
November 14, 2008 at 7:43 pm
Jeff Moden (11/14/2008)
You missed what I'm trying to say... the "costs" you see in the execution plan have nothing to do with reality.Tell you what... let's do a "real" test. You use your interive approach and I'll use the Tally table to come up with the 3 character counts for 100,000 10 character passwords and we'll see which method wins. To make the test extremely repeatable, you pick any 10 passwords and we'll "smear" them across 10,000 rows each. Game on?
Slightly optimized code:
SET @Password = 'password10'
SET @SpecialChars = (SELECT special_chars FROM dbo.password_policy)
SET @len = LEN(@Password);
SET @Numbers = 0;
SET @Capitals = 0;
SET @Lowercase = 0;
SET @Special = 0;
SET @i = 0;
WHILE (@i < @len)
BEGIN
SET @i = @i + 1;
IF ASCII(SUBSTRING(@Password, @i, 1)) BETWEEN 97 AND 122
SET @Lowercase = @Lowercase + 1;
ELSE
IF ASCII(SUBSTRING(@Password, @i, 1)) BETWEEN 65 AND 90
SET @Capitals = @Capitals + 1;
ELSE
IF ASCII(SUBSTRING(@Password, @i, 1)) BETWEEN 48 AND 57
SET @Numbers = @Numbers + 1;
SET @Special = @Special + (SELECT CASE WHEN @SpecialChars LIKE '%' + SUBSTRING(@Password, @i, 1) + '%' THEN 1 ELSE 0 END)
END
SELECT @Capitals 'Ucase', @Lowercase 'Lcase', @Numbers 'Numbr', @Special 'Schar'
And I'll acquiesce, because over a run of 20,000 with a permanent tally table (Jeff's condition, and that makes sense since that's the costliest operation), the original password given was 20 sec for Tally vs. 21 sec for the iterative. For "password10" however, it was 20 sec vs 20 sec. For "passwordis" the results are also 20 vs. 20. But when you kick up to 30,000 iterations, you see the Tally table creep ahead by a second.
K. Brian Kelley
@kbriankelley
November 14, 2008 at 8:04 pm
Jeff Moden (11/14/2008)
What about the time to write set based code? Well, if you look at the amount of code that Brian had to write to use the loop method, I'd say "NO"! What about RBAR being easier to write? If you don't understand set based, then I'd say "Of course it is"... just like playing chopsticks is easier to play than a good classical piece. BUT, a master can play the classical piece as easily as someone less practiced can play chopsticks and that's the key... people need to stop making excuses and learn their trade well enough to not have to write RBAR. There's just no excuse for RBAR except being too lazy to practice (not directed at anyone in particular so no hate mail please :D).
But truth be told, Jeff, the optimal solution would likely be to make the check with compiled code, whether it be an extended stored procedure or a .NET assembly. We all know that T-SQL doesn't optimally handle string operations, and that those solutions, which are procedural/iterative, tend to be significantly faster as the same type of solution in T-SQL. Given that the difference between the set-based and iterative solution in T-SQL is so small even over thousands of iterations, likely a compiled solution would be faster in comparison. It would certainly be an interesting test.
K. Brian Kelley
@kbriankelley
November 15, 2008 at 9:29 am
K. Brian Kelley (11/14/2008)
And I'll acquiesce, because over a run of 20,000 with a permanent tally table (Jeff's condition, and that makes sense since that's the costliest operation), the original password given was 20 sec for Tally vs. 21 sec for the iterative. For "password10" however, it was 20 sec vs 20 sec. For "passwordis" the results are also 20 vs. 20. But when you kick up to 30,000 iterations, you see the Tally table creep ahead by a second.
Um... if you're gonna make claims like that, you really need to show the rest of the code you used to test. For example, there is no way that, in it's current condition, the code you just posted could be used to do anything more than a single password. You either need to convert it to a function or you need to add another nest to your WHILE loop. Using GO 20000 isn't the same as running the Tally table solution as a set based solution.
Heh... so I'm not going to let you acquiesce yet...
--Jeff Moden
Change is inevitable... Change for the better is not.
November 15, 2008 at 9:51 am
K. Brian Kelley (11/14/2008)
But truth be told, Jeff, the optimal solution would likely be to make the check with compiled code, whether it be an extended stored procedure or a .NET assembly. We all know that T-SQL doesn't optimally handle string operations, and that those solutions, which are procedural/iterative, tend to be significantly faster as the same type of solution in T-SQL. Given that the difference between the set-based and iterative solution in T-SQL is so small even over thousands of iterations, likely a compiled solution would be faster in comparison. It would certainly be an interesting test.
Truth be told, that would be correct... especially in the form of RegEx. That's one of the very few places where a CLR will actually beat T-SQL. Since this is a 2k forum, you're correct that it would have to be in the form of an extended stored procedure. Even then, Matt Miller and I have found that properly written T-SQL damn near ties a CLR solution.
But, that's not what this is about... it's about procedural code vs the Tally table. The only way you can use your procedural code in a stored procedure (the request by the MadDogs, the OP), is to either create a function or add another WHILE nest to your existing code.
Let's up the anti a bit... let's actually solve the MadDogs' (the OP) original request and time it against 100,000 rows of 10 character passwords. It's gotta be done in 100% T-SQL and you have to use procedural code and I have to use the Tally table without WHILE loops, but other than that, no rules.
Heh... let the game begin. 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
November 15, 2008 at 10:35 am
Jeff,
You still haven't reposted your on-the-fly tally table for SQL Server 2005/2008. I'd be real happy to see that. I use a CTE version in a view (I only need 400 rows, actually less) but I could easily find other uses for such a feature.
Thanks,
Lynn
November 15, 2008 at 11:28 am
Sorry, Lynn... here it is and, yes, like your's, it's a CTE...
;WITH
cteTally AS
(--==== Create a Tally CTE from 1 to whatever the length
-- of the parameter is
SELECT TOP (LEN(@SomeCSVParameter))
ROW_NUMBER() OVER (ORDER BY t1.Object_ID) AS N
FROM Master.sys.All_Columns t1
CROSS JOIN Master.sys.All_Columns t2
)
SELECT yada-yada-yada
FROM cteTally
The full code can be found in the following article...
http://www.sqlservercentral.com/articles/T-SQL/63003/
In that article, I make the interesting observation of...
"Beyond that testing, one recent shocker for me while I was writing this article is that a "Tally CTE" used during a split is actually faster than a Tally Table by about a 10th of a second for the larger splits (8001 elements in a single parameter)."
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 16 through 30 (of 41 total)
You must be logged in to reply to this topic. Login to reply