July 27, 2012 at 8:02 am
David Burrows (7/27/2012)
GSquared (7/27/2012)
Generate and join on Checksum/Hashbytes is probably your best bet.I thought you had to test equality of the columns as well, as checksum values may not be unique.
Sure, but only on the columns that match checksum first. The tricky part (in addition to Tom's comments) is ensuring the query processor always matches checksums before testing the full value. This is partly why I said the detail is in the details - it can be hard to ensure this in practice without relying on undocumented behaviours. There are many other considerations, including the fact that the mere presence of a MAX type in the plan can affect quite counter-intuitive things (like how long shared locks are held and whether row versions are added by triggers). Truly it can be a minefield, which is why I made only conservative suggestions (use MAX only when really needed, and maybe cast to non-MAX in a temporary structure). Anything else requires great skill and care to do implement reliably.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
July 27, 2012 at 8:16 am
capn.hector (7/27/2012)
well now in the morning im even more embarrassed at the mistake i made.
It was only a small error; I wouldn't dwell too much on it. Focus on the positive things that can come out of it instead. Perhaps it will be some consolation to know that we've all done similar things to a greater or lesser extent (well I have anyway, and more than once).
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
July 27, 2012 at 8:27 am
Brandie Tarvin (7/26/2012)
WayneS (7/26/2012)
GilaMonster (7/26/2012)
And the vague question of the week award goes to: http://www.sqlservercentral.com/Forums/Topic1336028-391-1.aspxToo tempting... had to post there.
Wayne, you are so mean.
Brandie, I was just trying to give a vague question an equally vague, but still accurate, answer.
But... :-D:-D:-D:-D:-D
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
July 27, 2012 at 8:39 am
SQL Kiwi (7/27/2012)
Jeff Moden (7/27/2012)
I had an ephipany that just hit me like a ton of bricks....I see! Well that explains it
Wow... Another eye opener that you might not realize that you've just provided me. Here's another word that fits the category of things that look like ironic sarcasm because it's used as sarcasm so often in IT. The word "well". I've seen hundreds of posts that folks, including me, have taken as a bit of ironic sarcasm or "ring-knocker-ish" simply because of the word "well". For example, someone says something like "Well, if you had looked at the code that... etc, etc", it frequently comes across as "Well, you ignorant slob, if you had looked at the code that..." instead of it being correctly taken as word that is usually meant to show a bit of thoughtfull understanding.
The same goes for things like "Ah..." and "Ummmm..." the first being taken as a sarcastic pause instead of the "I see!" that is was meant to be and the "Ummmm..." being taken as a sign of sarcastic restraint instead of an attempt to show thoughtfulness. Even "Heh...." can come across as incredibly sarcastic, at times.
Man! I wonder how many forum battles have been fought over such misunderstandings. Thanks again for the feedback, Paul. This has been a real eye opener.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 27, 2012 at 8:44 am
WayneS (7/27/2012)
Brandie Tarvin (7/26/2012)
WayneS (7/26/2012)
GilaMonster (7/26/2012)
And the vague question of the week award goes to: http://www.sqlservercentral.com/Forums/Topic1336028-391-1.aspxToo tempting... had to post there.
Wayne, you are so mean.
Brandie, I was just trying to give a vague question an equally vague, but still accurate, answer.
But... :-D:-D:-D:-D:-D
But you got the second step wrong. "Change it" isn't good enough, that includes making it worse. 😉 The right second step is "Make it better". 😀
This common misunderstanding of the second step is the cause of 30% of IT project failures (other than those caused by managers who think there personal opinion on matters of which they have no knowledge or experience is gospel truth). :hehe:
Most of the other 70% (or failures no caused by management arrogance) are cause by idiots who think the second step is "Make it perfect". :w00t: That "most" includes some of mine :blush: but I've grown out of that now. At least I hope I have. 🙂
Tom
July 27, 2012 at 8:48 am
capn.hector (7/27/2012)
well now in the morning im even more embarrassed at the mistake i made.on another note my head hurts, where is my aspirin.
I agree with Paul. We've all made similar mistakes and you manned up to it. BTW... lot's of water and some B12 will help reinflate the brain cells the alcohol deflated by explosive decompression. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
July 27, 2012 at 8:52 am
L' Eomot Inversé (7/27/2012)
WayneS (7/27/2012)
Brandie Tarvin (7/26/2012)
WayneS (7/26/2012)
GilaMonster (7/26/2012)
And the vague question of the week award goes to: http://www.sqlservercentral.com/Forums/Topic1336028-391-1.aspxToo tempting... had to post there.
Wayne, you are so mean.
Brandie, I was just trying to give a vague question an equally vague, but still accurate, answer.
But... :-D:-D:-D:-D:-D
But you got the second step wrong. "Change it" isn't good enough, that includes making it worse. 😉 The right second step is "Make it better". 😀
And you are so right.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
July 27, 2012 at 8:59 am
Wow! I sure didn't expect that innocent posting of the splitter article to explode like this.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
July 27, 2012 at 9:02 am
Jeff Moden (7/27/2012)
SQL Kiwi (7/27/2012)
Jeff Moden (7/27/2012)
I had an ephipany that just hit me like a ton of bricks....I see! Well that explains it
...
Man! I wonder how many forum battles have been fought over such misunderstandings.
* Oh
* Heh
* Uhmmm
* ...
Basically, you're saying that if people didn't mumble on the forums nobody would find a reason to argue, right? 😛
-- Gianluca Sartori
July 27, 2012 at 9:02 am
L' Eomot Inversé (7/27/2012)
GSquared (7/27/2012)
Brandie Tarvin (7/27/2012)
Back to the semi-technical discussion...If joining a VARCHAR(8000) to a VARCHAR(MAX) value degrades performance, how does one get around this?
I'm not speaking in the context of the splitter function. I'm pretending I have two tables with joinable columns that were poorly designed (or different systems) and I need to join them.
Do I convert the 8k column to a MAX? Or is there some other way to resolve the issue?
Generate and join on Checksum/Hashbytes is probably your best bet.
You could start with a check on lengths (if the lengths are unequal so are the strings), then for the remaining candidates cast the varchar(max) strings to varchar(8000) (which is guaranteed to be a valid cast as the length doesn't exceed 8000). Of course if that's written as a single query the optimiser may do something very silly with it (leaving you with casts that would truncate, which is far from helpful) so some care is needed.
You certainly can't use hashbytes for this case unless the varchar(max) strings are known to be less than 8000 bytes, as its input is limited to 8000 bytes. Using checksum is possible (I think - at least it doesn't complain if you feed it a lot of bytes and I remember any documentation saying it's restricted to 8000), but it won't be effecient unless the optimiser is clever about it - the optimiser deals just fine with the case where the checksum is part of the data, but not neccessarily with the case where neither checksums being compared is a pre-computed checksum held in the data; and anyway, checksumming the long objects is quite slow, has to read and process all the data, whereas comparing lengths shouldn't involve reading and processing all the data - if a lot of the varchar(MAX) strings are very long the cheapness of length may offset the finer (but still far from perfect) filtering provided by checksum.
In long-string joins or Where clauses, using a checksum column, pre-computed and indexed, can greatly speed up query results if used correctly.
It's not the final test, because of collision and the possibility of input-length exceeding the capacity of the functions, but it can make a final query much more efficient if you can narrow down the rows that have to be tested for text-equality to those that have matching checksums.
I've tested this on tables with names in them, where the strings aren't even all that large, and seen huge performance increases. You'd have to do your own tests, of course, but comparing two integers can be much more efficient than comparing large strings. Larger strings get an even better hit than simple names, in my tests.
Say you have to compare two strings that are 10,000 characters each. If you generate a checksum on the first 100 characters, the chance of collision (false-positive) is real, but very small.
Or, let's say you need to find a row in a million-row table, where the text in it matches Moby Dick. You have the desired text in an input parameter. Two varchar(max) values being compared for equality, and no way to index varchar(max). But if you have an indexed checksum column of the first 100 or 1,000 characters, on the table you want to query, and you do a checksum of the same number of characters in your input parameter, you'll limit your text comparison to probably 1 row, maybe 2 or 3 false positives. Narrowed down that way, you then do the real comparison (and hope the input parameter doesn't have a typo on page 400 that the stored value doesn't have) to the resulting 2 or 3 rows. MUCH faster than a single query comparing the whole values.
That's an extreme example, but very long text strings for things like molecular formulae are very possible, and finding if one has already been tested against in a drug company lab is a very real possibility of a use for comparing multi-thousand character strings for exact matches.
Using length instead of a checksum is possible, but less likely to be useful in some cases. Afer all, H2O and NaS have the same length, but they are completely different molecules, and they generate completely different checksums/hashes.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 27, 2012 at 9:19 am
Gianluca Sartori (7/27/2012)
Jeff Moden (7/27/2012)
SQL Kiwi (7/27/2012)
Jeff Moden (7/27/2012)
I had an ephipany that just hit me like a ton of bricks....I see! Well that explains it
...
Man! I wonder how many forum battles have been fought over such misunderstandings.
* Oh
* Heh
* Uhmmm
* ...
Basically, you're saying that if people didn't mumble on the forums nobody would find a reason to argue, right? 😛
No no no that's absolute rubbish! You're stuck with 50's paper-tape technology and you're ignorant of ANSI standard 2012 & xptl....ow! Stop hitting me!
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 27, 2012 at 9:26 am
ChrisM@Work (7/27/2012)
No no no that's absolute rubbish! You're stuck with 50's paper-tape technology and you're ignorant of ANSI standard 2012 & xptl....ow! Stop hitting me!
LOL
Be careful using language like that, it could become addictive and you could also become persona non grata. :hehe:
Far away is close at hand in the images of elsewhere.
Anon.
July 27, 2012 at 9:32 am
ChrisM@Work (7/27/2012)
Gianluca Sartori (7/27/2012)
Jeff Moden (7/27/2012)
SQL Kiwi (7/27/2012)
Jeff Moden (7/27/2012)
I had an ephipany that just hit me like a ton of bricks....I see! Well that explains it
...
Man! I wonder how many forum battles have been fought over such misunderstandings.
* Oh
* Heh
* Uhmmm
* ...
Basically, you're saying that if people didn't mumble on the forums nobody would find a reason to argue, right? 😛
No no no that's absolute rubbish! You're stuck with 50's paper-tape technology and you're ignorant of ANSI standard 2012 & xptl....ow! Stop hitting me!
My temptation with Joe is always to inform him that he's sticking his code with 1800s-style logic, using Babbage-type cogs and wheels. After all, any binary operation does that, just electronically instead of mechanically. (Yes, that includes every operation ever done by any computer anywhere, and will be true until we start using analog computers instead of digital ones.)
That or accuse him of using 1930s-tyle "Turing Machine" logic in his algorithms. Since it predates current ISO standards, it must be wrong, per his logic.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 27, 2012 at 9:33 am
Gianluca Sartori (7/27/2012)
Jeff Moden (7/27/2012)
SQL Kiwi (7/27/2012)
Jeff Moden (7/27/2012)
I had an ephipany that just hit me like a ton of bricks....I see! Well that explains it
...
Man! I wonder how many forum battles have been fought over such misunderstandings.
* Oh
* Heh
* Uhmmm
* ...
Basically, you're saying that if people didn't mumble on the forums nobody would find a reason to argue, right? 😛
Amusing, but probably not that far from the truth. Wishy-washy language almost never works for good. I am guilty of this in real life all the time, not just on forums.
---------------------------------------------------------
How best to post your question[/url]
How to post performance problems[/url]
Tally Table:What it is and how it replaces a loop[/url]
"stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."
July 27, 2012 at 9:37 am
You guys probably know about this one already, but I just stumbled across the DM of the Rings webcomic, and am rollicking in D&D/LOTR hilarity. I've never even played D&D, so for those who have it's probably that much funnier...
http://www.shamusyoung.com/twentysidedtale/?p=612
---------------------------------------------------------
How best to post your question[/url]
How to post performance problems[/url]
Tally Table:What it is and how it replaces a loop[/url]
"stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."
Viewing 15 posts - 37,141 through 37,155 (of 66,712 total)
You must be logged in to reply to this topic. Login to reply