December 5, 2011 at 9:23 am
Yep I agree with you Drew, that's why I asked how many columns were involved, if it was only a couple it might have been feasible.
What about this way, keeping each case seperate and concatenating the results:
SELECT Emp_No,
HR_ID,
ISNULL(Emp_No_Err,'') + ', ' + ISNULL(HR_ID_Err,'') + ', ' + ISNULL(Emp_Name_Err,'') AS Errors
FROM
(SELECTEmp_No,
HR_ID,
CASE WHEN LEN(Emp_No) < 7 THEN 'Emp_No: Too Short' END AS Emp_No_Err,
CASE WHEN LEN(HR_ID) < 7 THEN 'HR_ID: Too Short' END AS HR_ID_Err,
CASE WHEN Emp_Name = 'Alfred' THEN 'Emp_Name: Alfred' END AS Emp_Name_Err
FROM
(SELECT 12345 AS Emp_No, 'abcdef' AS HR_ID, 'Keith' AS Emp_Name) testData
) errorData
December 5, 2011 at 9:24 am
December 5, 2011 at 9:28 am
Thanks Jim, I will try out your code .. I had a basic CASE statement structure ..
December 5, 2011 at 10:00 am
aarionsql (12/5/2011)
HI Drew,can I have a join in a CASE statement?
No, but you can have a correlated subquery, a (NOT) IN with a subquery, or a (NOT) EXISTS with a subquery in a CASE function. Depending on your conditions and how often the table is referenced, you may want to use a join or apply in the main query instead anyhow.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
December 5, 2011 at 10:30 am
You might get one or two ideas from the following. It's proving very usable for data cleansing, which is similar to your scenario - you want to identify a case and do something based upon it. Separating the detection from the action opens up a whole new world of possibilities.
SELECT
Emp_No,
HR_ID,
ErrorChecker.ErrorMsg
FROM
(SELECT 12345 AS Emp_No, 'abcdef' AS HR_ID) testData
CROSS APPLY (
SELECT
d.CommonError,
ErrorMsg =CASE WHEN d.CommonError = 1 THEN 'Emp_No: Too Short, HR_ID: Too Short ' ELSE '' END +
CASE WHEN d.Emp_NoError = 2 THEN 'Emp_No: Too Short ' ELSE '' END +
CASE WHEN d.HR_IDError = 3 THEN 'HR_ID: Too Short ' ELSE '' END
FROM (
SELECT
CommonError = CASE
WHEN LEN(Emp_No) < 7 AND LEN(HR_ID) < 7 THEN 1 ELSE NULL END,
Emp_NoError = CASE
WHEN LEN(Emp_No) < 7 AND LEN(HR_ID) > 6 THEN 2 ELSE NULL END,
HR_IDError = CASE
WHEN LEN(HR_ID) < 7 AND LEN(HR_ID) > 6 THEN 3 ELSE NULL END
) d
) ErrorChecker
Oh, and it's fast, too.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
December 5, 2011 at 11:30 am
ChrisM@Work (12/5/2011)
You might get one or two ideas from the following. It's proving very usable for data cleansing, which is similar to your scenario - you want to identify a case and do something based upon it. Separating the detection from the action opens up a whole new world of possibilities.
I don't see how you gain anything by calculating an intermediate code which you then use to calculate the error message instead of directly calculating the error message. It might be worthwhile if the same calculation is used in many different places, but that's not happening in your sample code.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
December 5, 2011 at 2:53 pm
CELKO (12/5/2011)
Your approach to SQL is fundamentally wrong.
Your approach to SQL training is fundamentally wrong. Without specifics on what you think is fundamentally wrong, why you think it is fundamentally wrong, and how to fix it, your statement only serves to disparage the original poster. People don't learn well when they're being attacked.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
December 5, 2011 at 2:55 pm
drew.allen (12/5/2011)
CELKO (12/5/2011)
Your approach to SQL is fundamentally wrong.Your approach to SQL training is fundamentally wrong. Without specifics on what you think is fundamentally wrong, why you think it is fundamentally wrong, and how to fix it, your statement only serves to disparage the original poster. People don't learn well when they're being attacked.
Drew
He's been told that 1000 times already. I guess he's too old or stuburn to learn anything by now.
Too bad for all of us.
December 5, 2011 at 3:03 pm
drew.allen (12/5/2011)
CELKO (12/5/2011)
Your approach to SQL is fundamentally wrong.Your approach to SQL training is fundamentally wrong. Without specifics on what you think is fundamentally wrong, why you think it is fundamentally wrong, and how to fix it, your statement only serves to disparage the original poster. People don't learn well when they're being attacked.
Drew
Pffft, it's actually just spam advertising. All he does is make sure his booklist has a higher hitrate for google spiders. Wonder if we should start reporting his posts that way... 'eh, Steve might not love us.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
December 5, 2011 at 3:14 pm
Evil Kraig F (12/5/2011)
drew.allen (12/5/2011)
CELKO (12/5/2011)
Your approach to SQL is fundamentally wrong.Your approach to SQL training is fundamentally wrong. Without specifics on what you think is fundamentally wrong, why you think it is fundamentally wrong, and how to fix it, your statement only serves to disparage the original poster. People don't learn well when they're being attacked.
Drew
Pffft, it's actually just spam advertising. All he does is make sure his booklist has a higher hitrate for google spiders. Wonder if we should start reporting his posts that way... 'eh, Steve might not love us.
He'd like the feedback and conversation on it.
Of course JC = spam 50% of the time.
December 6, 2011 at 1:39 am
drew.allen (12/5/2011)
ChrisM@Work (12/5/2011)
You might get one or two ideas from the following. It's proving very usable for data cleansing, which is similar to your scenario - you want to identify a case and do something based upon it. Separating the detection from the action opens up a whole new world of possibilities.I don't see how you gain anything by calculating an intermediate code which you then use to calculate the error message instead of directly calculating the error message. It might be worthwhile if the same calculation is used in many different places, but that's not happening in your sample code.
Drew
Ouch! Having a bad day, Drew? 😛
Sure it's not the best example, but the point - as stated - was to open up possibilities.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
December 6, 2011 at 7:12 am
ChrisM@Work (12/6/2011)
drew.allen (12/5/2011)
ChrisM@Work (12/5/2011)
You might get one or two ideas from the following. It's proving very usable for data cleansing, which is similar to your scenario - you want to identify a case and do something based upon it. Separating the detection from the action opens up a whole new world of possibilities.I don't see how you gain anything by calculating an intermediate code which you then use to calculate the error message instead of directly calculating the error message. It might be worthwhile if the same calculation is used in many different places, but that's not happening in your sample code.
Drew
Ouch! Having a bad day, Drew? 😛
Sure it's not the best example, but the point - as stated - was to open up possibilities.
It's just inefficient. You have to touch each row twice: once to assign the code and once to assign the actual error message. Sometimes it can be worth trading off efficiency for a gain somewhere else, but I don't see where you're gaining anything here, so you're just sacrificing efficiency.
It also makes your code harder to read, and therefore update, because of the extra layer.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
December 6, 2011 at 7:33 am
drew.allen (12/6/2011)
ChrisM@Work (12/6/2011)
drew.allen (12/5/2011)
ChrisM@Work (12/5/2011)
You might get one or two ideas from the following. It's proving very usable for data cleansing, which is similar to your scenario - you want to identify a case and do something based upon it. Separating the detection from the action opens up a whole new world of possibilities.I don't see how you gain anything by calculating an intermediate code which you then use to calculate the error message instead of directly calculating the error message. It might be worthwhile if the same calculation is used in many different places, but that's not happening in your sample code.
Drew
Ouch! Having a bad day, Drew? 😛
Sure it's not the best example, but the point - as stated - was to open up possibilities.
It's just inefficient. You have to touch each row twice: once to assign the code and once to assign the actual error message. Sometimes it can be worth trading off efficiency for a gain somewhere else, but I don't see where you're gaining anything here, so you're just sacrificing efficiency.
It also makes your code harder to read, and therefore update, because of the extra layer.
Drew
Point taken.
Now imagine you're writing a query to output cleaned title, firstname, lastname from a table of appallingly bad data. You've set up a shedload of detection and cleaning rules. Some of those cleaning rules will apply to only one column, so a plain CASE WHEN ... THEN ... would be fine. Many however will apply to multiple columns - a single detection rule will dictate changes to both title and firstname, for instance. Why run the expensive detection CASE twice, once for each column?
There's another benefit from this approach too. You can output the detection codes. When you're checking the output for discrepancies you can see which codes are faulty. Trying to figure out which CASE is at fault without this can be extraordinarily difficult. Last but not least - outputting the detection codes allows you to measure which ones are most frequently encountered and put them at the top of the CASE construct, thereby reducing the number of unnecessary checks.
Also - you're not touching the row twice. You're still touching it throughout detection and action.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
December 6, 2011 at 8:08 am
ChrisM@Work (12/6/2011)
Now imagine you're writing a query to output cleaned title, firstname, lastname from a table of appallingly bad data. You've set up a shedload of detection and cleaning rules. Some of those cleaning rules will apply to only one column, so a plain CASE WHEN ... THEN ... would be fine. Many however will apply to multiple columns - a single detection rule will dictate changes to both title and firstname, for instance. Why run the expensive detection CASE twice, once for each column?
Actually, in that situation, I suspect that an OUTER APPLY might actually be more efficient and would almost certainly be easier to read. Something like the following:
SELECT *
FROM YourDirtyDataTable
OUTER APPLY (
SELECT 1 AS ErrorCode1, 'Your specific error' AS ErrorMessage1, 2 AS ErrorCode2
, 'Another error message here' AS ErrorMessage2, <some expression> as Title
, <some other expression> as FirstName
WHERE <your complex detection expression here>
) AS e1
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
December 6, 2011 at 8:18 am
drew.allen (12/6/2011)
ChrisM@Work (12/6/2011)
Now imagine you're writing a query to output cleaned title, firstname, lastname from a table of appallingly bad data. You've set up a shedload of detection and cleaning rules. Some of those cleaning rules will apply to only one column, so a plain CASE WHEN ... THEN ... would be fine. Many however will apply to multiple columns - a single detection rule will dictate changes to both title and firstname, for instance. Why run the expensive detection CASE twice, once for each column?Actually, in that situation, I suspect that an OUTER APPLY might actually be more efficient and would almost certainly be easier to read. Something like the following:
SELECT *
FROM YourDirtyDataTable
OUTER APPLY (
SELECT 1 AS ErrorCode1, 'Your specific error' AS ErrorMessage1, 2 AS ErrorCode2
, 'Another error message here' AS ErrorMessage2, <some expression> as Title
, <some other expression> as FirstName
WHERE <your complex detection expression here>
) AS e1
Drew
You're absolutely right - CROSS APPLY is well nice for this - if you exceed SQL Server's expressions limit, you can pop it seamlessly into an iTVF and voila! Error gone, code gone:-D
We'll have to differ on this one Drew. I'm using the method as we speak having started with single CASE expressions, and switching to separate detection/effect has made this project tons easier and far far easier to read.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Viewing 15 posts - 16 through 29 (of 29 total)
You must be logged in to reply to this topic. Login to reply