July 27, 2012 at 5:25 am
Back to the semi-technical discussion...
If joining a VARCHAR(8000) to a VARCHAR(MAX) value degrades performance, how does one get around this?
I'm not speaking in the context of the splitter function. I'm pretending I have two tables with joinable columns that were poorly designed (or different systems) and I need to join them.
Do I convert the 8k column to a MAX? Or is there some other way to resolve the issue?
July 27, 2012 at 5:36 am
ChrisM@Work (7/27/2012)
I'm still uncomfortable with "in order to handle our longest string (500,000 characters)". In all the time I've been working, and lurking here, I've never seen a requirement for splitting a string longer than one or two hundred characters, and more than thirty or forty is rare.
Oh, I have, most recently as a list of security ids for a financial application, which would typically be thousands or tens or thousands of items long. No doubt very many real-world uses could happily use any splitting mechanism, but is that the point?
I like the way the article is written and presented, it's very readable, and it's a shame that it's a crock; but if someone posted a performance comparison using my code bastardised (against clear written advice, sorry Paul but it's there) to perform poorly, I'd want his keyfob changed to let him into the side door of KFC.
Ha. You are wasted in the database world, Chris - you should be a diplomat.
July 27, 2012 at 5:39 am
Jeff, this is all your fault.
http://www.sqlservercentral.com/Forums/Topic1336367-147-1.aspx#bm1336413
ps: irony
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
July 27, 2012 at 5:43 am
Brandie Tarvin (7/27/2012)
If joining a VARCHAR(8000) to a VARCHAR(MAX) value degrades performance, how does one get around this?
By only using MAX where the data to be stored can legitimately exceed 8000 bytes.
I'm not speaking in the context of the splitter function.
For what it's worth, I can't see a join involving a MAX type in the splitter function.
I'm pretending I have two tables with joinable columns that were poorly designed (or different systems) and I need to join them. Do I convert the 8k column to a MAX? Or is there some other way to resolve the issue?
The details depend on the details of course, but the broad reason for the performance difference is that MAX types are potentially very large (2GB), so the optimizer and execution engine can't or won't use certain optimizations, and/or must take different code paths that account for the size issue (and which are inevitably slower as a result). In some cases, it might be possible to materialize the MAX values as non-MAX types before performing the join. It is difficult to generalize though.
July 27, 2012 at 6:22 am
The rapid posting must have defrosted hundreds of pork chops. ๐
Four banded?
I heard rumors a Trebuchet is in the works.
Kudos to Jeff on the restraint shown.
And some of the others.
Dog piling is not the answer.
Most of us would rather quietly point to better ways and options than fan the flames.
And let the discussions revolve around technical and thought provoking dialog.
To me, there is no better way of learning than taking examples and running them through a similar situations at work.
July 27, 2012 at 6:57 am
Heya Pauliepoos, haven't seen you posting for ages!
SQL Kiwi (7/27/2012)
ChrisM@Work (7/27/2012)
I'm still uncomfortable with "in order to handle our longest string (500,000 characters)". In all the time I've been working, and lurking here, I've never seen a requirement for splitting a string longer than one or two hundred characters, and more than thirty or forty is rare.Oh, I have, most recently as a list of security ids for a financial application, which would typically be thousands or tens or thousands of items long. No doubt very many real-world uses could happily use any splitting mechanism, but is that the point?
There's a Ford for most of us but if it's not up to your requirements then you could have one of these[/url]. Horses for courses.
SQL Kiwi (7/27/2012)
ChrisM@Work (7/27/2012)
I like the way the article is written and presented, it's very readable, and it's a shame that it's a crock; but if someone posted a performance comparison using my code bastardised (against clear written advice, sorry Paul but it's there) to perform poorly, I'd want his keyfob changed to let him into the side door of KFC.Ha. You are wasted in the database world, Chris - you should be a diplomat.
Mr B might disagree ๐
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 27, 2012 at 6:58 am
Brandie Tarvin (7/27/2012)
Back to the semi-technical discussion...If joining a VARCHAR(8000) to a VARCHAR(MAX) value degrades performance, how does one get around this?
I'm not speaking in the context of the splitter function. I'm pretending I have two tables with joinable columns that were poorly designed (or different systems) and I need to join them.
Do I convert the 8k column to a MAX? Or is there some other way to resolve the issue?
Generate and join on Checksum/Hashbytes is probably your best bet.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 27, 2012 at 7:15 am
GSquared (7/27/2012)
Generate and join on Checksum/Hashbytes is probably your best bet.
I thought you had to test equality of the columns as well, as checksum values may not be unique.
Far away is close at hand in the images of elsewhere.
Anon.
July 27, 2012 at 7:34 am
SQL Kiwi (7/27/2012)
Jeff Moden (7/26/2012)
There was no way for me to take your comment as other than sarcasm whether Hector's post was there or not. Back off, Paul.Well it definitely wasn't sarcasm, and I'm really not sure how to get you to understand that. Nevertheless:
"Oh good." = Glad you to hear the article will be updated to help prevent future misunderstandings. No sarcasm.
"I see there are new comments on Aaron's post now too, which seem to have improved the direction of the debate a great deal." = The first few responses were a bit snarky, in my opinion. The next few were much better, again in my opinion. No sarcasm.
And people ask me why I don't post as much on SSC as I used to ๐
I had an ephipany that just hit me like a ton of bricks. I can't speak for the others but I just figured out why there's been friction between thee and me and your post above hit the nail on the head.
Even though we both supposedly speak the same language, we patently do not. There actually is a language barrier between us and I totally missed that fact. Since this is all based on the "Tally OH!" table, it's ironic that the word "Oh" was the primary cause of this latest misunderstanding between us.
As you said, you were genuinely saying "Good". Because I couldn't hear your voice, I took "Oh good" (especially considering the other comments on this part of the tread) as dripping with sarcasm because of the way ironic sarcasm is frequently used in the U.S.A. especially in IT departments. It's a horrible thing because it really is "snark-asm" at its worst.
It was all because of timing and the word "Oh" in this case. Had you said, "Thanks Jeff" or "Good enough" or just plain "Good" instead of "Oh Good", I'd have taken it as non-sarcastic. I've had way too many people say it (especially in IT) meaning quite the opposite... more like "Oh good! I get to work late again tonight because of someone else's poor planning. What the hell... I didn't really want to see my wife and kids, anyway." See what I mean? The "Oh good" is like an encapsulating HTML sarcasm tag where everthing after it is included in the tag.
Now that I'm aware of the problem, I'll be a whole lot more careful, as well. Looking back at things, I can think of dozens of places where this has happened with dozens of people.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 27, 2012 at 7:39 am
Hey All, When I saw 80 notification emails today morning from THE THREAD, I had to take a look at what was happening.
Well, Nothing changed. Jeff is still going strong and Celko is his usual self. Rest of the gang is still here.
๐
-Roy
July 27, 2012 at 7:40 am
Koen Verbeeck (7/27/2012)
Jeff, this is all your fault.http://www.sqlservercentral.com/Forums/Topic1336367-147-1.aspx#bm1336413
ps: irony
BWAAA_HAAA!!!! Too funny, Koen! I think that even without your "ps", I'd have picked up on the irony of that one! ๐
--Jeff Moden
Change is inevitable... Change for the better is not.
July 27, 2012 at 7:54 am
GSquared (7/27/2012)
Brandie Tarvin (7/27/2012)
Back to the semi-technical discussion...If joining a VARCHAR(8000) to a VARCHAR(MAX) value degrades performance, how does one get around this?
I'm not speaking in the context of the splitter function. I'm pretending I have two tables with joinable columns that were poorly designed (or different systems) and I need to join them.
Do I convert the 8k column to a MAX? Or is there some other way to resolve the issue?
Generate and join on Checksum/Hashbytes is probably your best bet.
You could start with a check on lengths (if the lengths are unequal so are the strings), then for the remaining candidates cast the varchar(max) strings to varchar(8000) (which is guaranteed to be a valid cast as the length doesn't exceed 8000). Of course if that's written as a single query the optimiser may do something very silly with it (leaving you with casts that would truncate, which is far from helpful) so some care is needed.
You certainly can't use hashbytes for this case unless the varchar(max) strings are known to be less than 8000 bytes, as its input is limited to 8000 bytes. Using checksum is possible (I think - at least it doesn't complain if you feed it a lot of bytes and I remember any documentation saying it's restricted to 8000), but it won't be effecient unless the optimiser is clever about it - the optimiser deals just fine with the case where the checksum is part of the data, but not neccessarily with the case where neither checksums being compared is a pre-computed checksum held in the data; and anyway, checksumming the long objects is quite slow, has to read and process all the data, whereas comparing lengths shouldn't involve reading and processing all the data - if a lot of the varchar(MAX) strings are very long the cheapness of length may offset the finer (but still far from perfect) filtering provided by checksum.
Tom
July 27, 2012 at 7:55 am
Jeff Moden (7/27/2012)
I had an ephipany that just hit me like a ton of bricks....
I see! Well that explains it (the 'oh' was purely an expression of surprise on my part; I did not know you had already submitted changes to the article).
July 27, 2012 at 7:58 am
Ugh, junior coder here is using SELECT * INTO in production code and won't listen to me when I tell him to change it, even with non-CELKO-like explanations.
--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
Itโs unpleasantly like being drunk.
Whatโs so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
July 27, 2012 at 7:59 am
well now in the morning im even more embarrassed at the mistake i made.
on another note my head hurts, where is my aspirin.
For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]
Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
Jeff Moden's Cross tab and Pivots Part 1[/url]
Jeff Moden's Cross tab and Pivots Part 2[/url]
Viewing 15 posts - 37,126 through 37,140 (of 66,738 total)
You must be logged in to reply to this topic. Login to reply